Identication of the Ferroptosis-Related Long NonCoding RNAs Signature to Improve the Prognosis Prediction in patients with NSCLC

Background: Non-small cell lung cancer (NSCLC) is the most prevalent type of lung carcinoma with an unfavorable prognosis. Ferroptosis, a novel iron-dependent programmed cell death, is involved in the development of multiple cancers. Of note, the prognostic value of ferroptosis-related lncRNAs in NSCLC remains uncertain. Methods: Gene expression proles and clinical information of NSCLC were retrieved from the TCGA database. Ferroptosis-related genes (FRGs) were explored in the FerrDb database and ferroptosis-related lncRNAs (FRGs-lncRNAs) were identied by the correlation analysis and the LncTarD database. Next, The differentially expressed FRGs-lncRNAs were screened and FRGs-lncRNAs associated with the prognosis were explored by univariate Cox regression analysis and Kaplan-Meier survival analysis. Then, an FRGs-lncRNAs signature was constructed by the Lasso-penalized Cox model in the training cohort and veried by internal and external validation. Finally, the potential correlation between risk score, immune response, and chemotherapeutic sensitivity was further investigated. Results: 129 lncRNAs with a potential regulatory relationship with 59 differentially expressed FRGs were found in NSCLC and 10 FRGs-lncRNAs associated with the prognosis of NSCLC were identied (P<0.05). 9 prognostic-related FRGs-lncRNAs (AQP4-AS1, DANCR, LINC00460, LINC00892, LINC00996, MED4-AS1, SNHG7, UCA1, and WWC2-AS2) were used to construct the prognostic model and stratify patients with NSCLC into high- and low-risk groups. Kaplan-Meier analysis demonstrated a worse outcome in patients with high risk (P<0.05). Moreover, a good predictive capacity of this signature in predicting NSCLC prognosis was conrmed by the ROC curve analysis. Additionally, 45 immune checkpoint genes and 8 m6A-related genes were found differentially expressed in the two risk groups, and the sensitivity of 28 chemotherapeutics were identied to be correlated with the risk score. Conclusion: A novel FRGs-lncRNAs signature was successfully constructed, which may contribute to improving the management strategies of NSCLC.


Introduction
Lung cancer ranks top in the incidence of malignancies and imposes an enormous socio-economic burden worldwide [1]. Non-small-cell lung cancer (NSCLC) is the most prevalent subtype of primary lung cancer, among which adenocarcinoma (LUAD) as well as squamous cell carcinoma (LUSC) are the leading histology [2,3]. Although there have been breakthroughs in the targeted therapy and immunotherapy of lung cancer, the long-time survival rate of NSCLC remains unsatisfactory [4][5][6], and most patients will inevitably develop drug resistance. Since most anti-tumor drugs play a therapeutic role by inducing apoptosis of cancer cells, it is of great signi cance to pinpoint novel cell death pathways in NSCLC and discover new directions for identifying treatment strategies and evaluating prognosis.
Ferroptosis, a newly discovered mode of iron-dependent regulated cell death unlike apoptosis, pyroptosis, autophagy and necrosis, is mainly caused by the iron accumulation-mediated lipid peroxidation and exhibits peculiar morphology, genetics, as well as biochemistry features [7]. The development of disease and organisms and anti-tumor drug resistance has also been veri ed to be affected by ferroptosis [8,9].
Long non-coding RNAs (lncRNAs) is a kind of RNA with more than 200 bases in length. Although lncRNAs lack protein-coding ability, it is still considered a target for gene therapy of cancer since it is involved in the tumor occurrence, development as well as metastasis by modulating gene expression at chromatin, transcriptional, and post-transcriptional levels [10,11]. Additionally, some lncRNAs were found to inhibit ferroptosis by acting as competitive endogenous RNA to prevent oxidation in various cancers including lung cancer. [12,13]. Metallothionein 1D pseudogene, a lncRNA, was found to sensitize erastin-induced ferroptosis in NSCLC by modulating the miR-365a-3p/NRF2 axis [14]. A G3BP1-interacting lncRNA was demonstrated to promote ferroptosis via nuclear sequestration of p53 in lung carcinoma [15]. Whereas, the signature of ferroptosis-related lncRNAs and its association with prognosis in NSCLC has not been systematically evaluated.
Herein, we explored the expression of ferroptosis-related genes (FRGs) and ferroptosis-related lncRNAs (FRGs-lncRNAs) in NSCLC and further investigate the relationship between these lncRNAs and the prognosis in NSCLC based on the Cancer Genome Atlas (TCGA) database. Subsequently, a prognostic model was constructed on the basis of FRGs-lncRNAs in the training set and veri ed by internal and external validation. Furthermore, the relationship between risk score, clinicopathological features, immune responses, N6-methyladenosine (m6A) mRNA statues, and chemotherapeutics sensitivity was further elucidated in NSCLC.

Data acquisition and processing
Gene expression pro les and clinical features of TCGA-LUAD (510 tumors and 58 normal) and TCGA-LUSC (496 tumors and 49 normal) were obtained from UCSC Xena (https://xena.ucsc.edu/). The expression pro les of LUAD and LUSC were combined as NSCLC expression matrix, and a total of 1093 patients (986 tumors and 107 normal) were eventually included in the present study after the batch correction by using ComBat function of the R "sva" package. No ethics committee approval and informed consent of patients were required in this study since the data were obtained from a public database.
Identi cation of FRGs 259 FRGs (Table S1) were explored from the "Marker", "Driver" and "Suppressor" modules of the FerrDb database (http://www.zhounan.org/ferrdb/) [16], among which 241 were found to be expressed in NSCLC. The expression pro les of these FRGs were extracted to identify the differentially expressed FRGs (DE-FRGs) between normal and tumor by the "limma" package of R. |log2FC|>1and FDR < 0.05 is considered to be signi cant.
Functional enrichment analysis "ClusterPro ler" of R was utilized to conducted Gene Ontology (GO) as well as Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis to investigate the biological functions and signaling pathways affected by these DE-FRGs [17].

Screening of FRG-lncRNAs
The Pearson correlation analysis was conducted to identify the potential lncRNAs correlated with DE-FRGs based on the TCGA database. |R|>0.5 and P < 0.001 were considered a strong correlation. In addition, the lncRNAs that have a regulatory relationship with DE-FRGs and expressed in NSCLC were further explored and screened in the LncTarD database (http://bio-bigdata.hrbmu.edu.cn/LncTarD/) [18]. Then, lncRNAs that have an expression and regulatory relationship with DE-FRGs in TCGA and LncTarD database were unionized to obtain candidate FRGs-lncRNAs. The "survival" package, univariate Cox regression analysis, and Kaplan-Meier (K-M) survival method were conducted to investigate FRGs-lncRNAs associated with the prognosis in NSCLC.
FRG-lncRNAs prognostic model construction 986 NSCLC patients were separated into a training set and veri cation set at a ratio of 7:3 by R randomly (690 and 296 patients in the training and test set, respectively). Lasso-penalized Cox regression analysis was performed to establish a FRGs-lncRNAs prognostic model in the training set using the R "glmnet" package [19,20]. The risk score of each sample was calculated on the basis of the normalized expression levels and the corresponding Lasso's coe cient of the FRGs-lncRNAs [Risk score=∑exp(i)*coef(i)], and patients were further categorized as high-and low-risk groups in accordance with the median risk score.
Then "ggplot2" package of R were performed to draw the survival scatter plot. K-M survival curve and receiver operational characteristic (ROC) curves were generated to evaluate the relationship between risk score and prognosis, and the predictive capacity of the signature. Additionally, the expression levels of FRGs-lncRNAs of each sample in the two risk groups were uncovered by "heatmap" package. Finally, the results were further identi ed in the internal and external validation groups.

Predictive nomogram construction
Wilcoxon test was performed to investigate the potential relationship between risk score and multiple clinical features (EGFR mutation, ALK-EML4 rearrangement, age, gender, stage, and TNM stage). P < 0.05 was considered to be signi cant. Subsequently, the independent prognostic factors were investigated by univariate and multivariate Cox regression analysis and visualized the results by R "forestplot" package. Finally, a nomogram was established integrating the clinical features and risk score for predicting 1, 3, and 5-year overall survival (OS) of NSCLC patients, and the prediction accuracy of which was further evaluated by the calibration and ROC curve.

Immunity analysis and gene expression
Five algorithms, ESTIMATE, ssGSEA, MCPcounter, TIMER and CIBERSORT were utilized to evaluate the correlation between risk score and the immune response in NSCLC, and the heatmaps were used to uncover the differences of immune cell in ltration between the two risk groups under different algorithms.
Expression of immune checkpoint and m6A expression in NSCLC 79 immune checkpoint genes (ICGs) were explored from the literature (PMID: 32814346), of which 66 were expressed in NSCLC. Student t-test was used for difference analysis, and the top 10 ICGs with signi cant differences were shown as boxplots. In addition, the expression differences of the 13 factors involved in m6A between the high and low-risk groups were further analyzed.
Correlation analysis between risk score and chemotherapeutic drugs sensitivity All the cell lines in the Cancer Genome Project (CGP) database [21] combined with the expression pro le of the training set were utilized to evaluate the role of the risk score in the sensitivity of chemotherapeutics by using the R package "pRRophetic". A total of 138 chemotherapeutics were identi ed in the CGP database, then, Pearson correlation analysis was performed to investigate the relationship between risk score and drug sensitivity. P < 0.05 and |R| > 0.3 is considered signi cant correlation.

Statistical analysis
Perl and R (4.0.1) were utilized for data processing and statistical analysis. Benjamini-Hochberg method was performed to identify the DE-FRGs. Pearson correlation analysis were conducted to explore the lncRNAs associated with DE-FRGs. FRGs-lncRNAs associated with prognosis were explored by univariate Cox regression tests and K-M survival analysis in NSCLC. A FRGs-lncRNAs prognostic model was generated by the Lasso-penalized Cox regression analysis and further evaluated the predictive capacity by K-M survival and ROC curve analysis. Independent predictors of OS in NSCLC were identi ed by multivariate Cox regression analysis. P < 0.05 was considered signi cant.

Results
Identi cation and enrichment analysis of FRGs in NSCLC 986 patients with NSCLC and 107 healthy control from the TCGA database were enrolled in the present study. 259 FRGs were explored from the FerrDb database, and 241 were found to be expressed in NSCLC (Table S1). The process of our research is shown in Fig. 1. Herein, we found 59 FRGs were signi cantly differentially expressed between normal and NSCLC patients (28 downregulated and 31 upregulated, Fig. 2A and B). The DE-FRGs were determined mainly involved in biological pathways related to the response to oxidative stress, lipid droplet, and oxidoreductase activity biological function by GO analysis (Fig. 3A). KEGG enrichment analysis revealed that these DE-FRGs also participate in ferroptosis, arachidonic acid metabolism, glutathione metabolism, and NOD-like receptor signaling pathway (Fig. 3B).
Additionally, the K-M plot illustrated a worse prognosis in the NSCLC patients with the high-risk (P 0.001, Figure E) and ROC curve analysis demonstrated AUC for predicting NSCLC prognosis were 0.606 (Fig. 6F). The expression levels of the 9 FRGs-lncRNAs in each sample between the two risk groups are presented in Fig. 6G.

Validation of the prognostic signature
To evaluate the robustness of the prognostic model, the patients from the testing cohort were also grouped according to the median of the same risk score (Fig. 7A), and patients with high risk were found to reach the end of life earlier like the training group (Fig. 7B). At the same time, a poor prognosis was found in NSCLC patients with high risk in the testing group simultaneously (P = 0.004, Fig. 7C), and a good accuracy of the prognostic model in predicting the OS of NSCLC were also elucidated in the testing group (Fig. 7D, AUC = 0.604). The expression trend of the 9 FRGs-LncRNAs between the two risk groups of the testing cohort was similar to that of the training group (Fig. 7E).
In addition, external validation was also carried out in GES31210 to further verify the effectiveness of this prognostic model. Here, 8 lncRNAs (without MED4-AS1) were utilized for validation since MED4-AS1 expression was not found in NSCLC in this cohort. The results of the survival status scatter plot and heatmap in this cohort were similar to those of the training group ( Figure S1A-B, Figure S1 E). K-M analysis illustrated that the risk score based on these 8 lncRNAs are correlated with the prognosis of NSCLC (P = 0.016, Figure S1 C), and the ROC curve also demonstrated a good prediction performance of this prognostic signature (AUC = 0.616, Figure S1 D).

Correlation analysis between risk score and clinical characteristics
The potential relationship between risk score and multiple clinical characteristics (age, gender, EGFR mutation, ALK-EML4 rearrangement, stage and TNM stage) was also analyzed in the present study. The results illustrated that risk score was related to gender, stage and T, N stage, among which, male patients and those with higher T and N stage had higher risk (Fig. 8A-H). Further subgroup analysis found that the T stage is correlated with risk score in both LUAD and LUSC, and stages and N stage were found to relate to the LUAD, and M stage showed a relationship with LUSC ( Figure S2).
Univariate Cox regression analysis elucidated that risk score, stage, T, N, M stages are the risk factors for the NSCLC prognosis (P 0.05, Fig. 8I). Additionally, risk score as well as T, N, M stages remain independent risk factors for the OS of NSCLC patients in multivariate Cox regression analysis (P 0.05, Fig. 8J). Then, a nomogram merging clinicopathological features and the risk scores was constructed to assist clinical prediction of the prognosis of NSCLC patients (Figs. 9A), and a satisfactory agreement between the predicted and observed values at the probabilities of 1-, 3-and 5-year survival was shown in the calibration curve (Figs. 9B). What's more, the ROC curve revealed that the risk score was more accurate in predicting survival in patients with NSCLC than traditional clinical features (Fig. 9C).

Correlation analysis between risk score and immune cell in ltration
Five algorithms, ESTIMATE, ssGSEA, MCP counter, TIMER and CIBERSORT were utilized to evaluate the correlation between risk score and immune cell in ltration in NSCLC and the results were visualized as heatmap (Fig. 10A-E). The expression of ICGs between the two risk groups was further investigated due to the crucial role of checkpoint inhibitor-based immunotherapies in NSCLC. The results showed a substantial difference in the expression of 45 ICGs between the two risk groups (Table S6), and the rst 10 ICGs (BTLA, BTN2A2, CD160, CD226, CD27, CD276, CD40LG, CD96, CTLA4, TIGIT) were presented in Fig. 11A. In addition, there are 8 m6A-related mRNA differentially expressed in the two-risk group, among which, 7 m6A-related mRNAs (ELAVL1 FMR1 IGF2BP1 METTL3 RBM15 YTHDC1, YTHDC2) were highly expressed in the low-risk group, and HNRNPA2B1 was upregulated in the high-risk group (Fig. 11B).
Correlation analysis between risk score and chemotherapeutics sensitivity As shown in Fig. 12, 28 chemotherapeutics were found to be signi cantly correlated with the risk score, among which the sensitivity of 19 drugs and the resistance of 9 drugs were signi cantly correlated with the risk score.

Discussion
Lung cancer, the most fatal malignancy worldwide, has a variety of histological subtypes. As the most prevalent pathological pattern of lung cancer, NSCLC patients are usually diagnosed at an advanced stage with poor survival due to the lack of early speci c clinical manifestations, diagnostic and prognostic biomarkers [22]. Over the past decade, the long-term survival rate of advanced NSCLC has been signi cantly prolonged due to the development of targeted therapies and immunotherapy. Unfortunately, poor prognoses remain in some patients after systemic and targeted therapy [23,24]. Hence, there is an urgent need for safe and feasible markers that can accurately predict prognosis, so as to make the management of NSCLC patients more accurate, personalized and timely.
Ferroptosis is a newly type regulatory cell death with speci c properties and recognizing functions that participated in numerous diseases including cancers [25],which was identi ed involved in killing malignant cells and inhibiting tumor progression in several cancers, such as NSCLC [26], pancreatic cancer [27], breast cancer [28]and hepatocellular carcinoma [29] and consequently considered as a novel therapeutic strategy for cancer treatment. Interestingly, the crucial role of lncRNAs in the regulation of ferroptosis has been increasingly recognized [30]. Whereas, the role of FRGs-lncRNAs in prognostic, immune response, and chemotherapeutic effect in NSCLC remains unclear.
Herein, the data of LUAD and LUSC from the TCGA database were combined as a matrix of NSCLC for analysis for the rst time. We found 59 FRGs and 129 FRGs-lncRNAs differentially expressed between NSCLC and normal patients. Then, 10 FRGs-lncRNAs associated with the OS of NSCLC were further identi ed. Subsequently, a prognostic model was established in the training set based on the 9 prognostic FRGs-lncRNAs and veri ed in the internal and external cohort. Additionally, the relationship between risk score and clinical characteristics of NSCLC were assessed and a nomogram was constructed. Finally, the correlation between immune cell in ltration, chemotherapeutic sensitivity as well as risk score were investigated to evaluate the potential role of FRGs-lncRNAs in the immune response and chemotherapeutic effect in NSCLC. These ndings strongly suggest a potentially important role of FRGs-lncRNAs in NSCLC.
In our study, the predictive model was established based on the 9 FRGs-lncRNAs (AQP4-AS1, DANCR, LINC00460, LINC00892, LINC00996, MED4-AS1, SNHG7, UCA1, and WWC2-AS2). Aquaporin 4 antisense RNA 1 (AQP4-AS1) transcribes a lncRNA with unknown function, which was found related to the risk of gastric [31] and breast cancer [32]. We found that AQP4-AS1 act as a protector factor in NSCLC. Whereas, the biological function of AQP4-AS1 in ferroptosis and lung cancer has not been systematically analyzed, which need to be further studied. LncRNA differentiation antagonizing non-protein-coding RNA (DANCR), located on chr.4, is known to suppress epidermal progenitor cell differentiation [33]. Importantly, DANCR was identi ed to be overexpressed in multiple cancers, promotes malignant biological behavior cancer and chemo-resistance [34]. DANCR was also found upregulated and correlated with the poor prognosis in NSCLC, and which promote the malignancy of NSCLC thorough DANCR/miR-138/Sox4 positive feedback loop [35]. Here, we found that DANCR was a protective factor in NSCLC. LINC00460 was also demonstrated overexpressed in NSCLC and promotes epithelial-mesenchymal transition and cell migration [36], what's more, as an FRGs-lncRNA, LINC00460 was identi ed as a predictor and potential therapeutic target for EGFR-TKI resistance in NSCLC [37]. Although similar results with the previous studies were found in our study, whether ferroptosis involved in the drug resistance and development of NSCLC needs to be further investigated. Despite several kinds of research that have illuminated the role of LINC00892 and LINC00996 in multiple cancers, their function in NSCLC has not been studied and deserves further investigation considering its signi cant prognostic value in NSCLC. MED4-AS1 is a novel lncRNA that upregulated in NSCLC and positively associated with poor differentiation and metastasis [38], whereas, MED4-AS1 was identi ed as a protective factor for OS of NSCLC in the present study. The lncRNA small nucleolar RNA host gene 7 (SNHG7) was considered as an oncogenic lncRNA in NSCLC, hepatocellular carcinoma, breast cancer, and colorectal cancer [39,40], which was found to modulate malignant character in LUAD through SNHG7/miRNA-181/cbx7 pathway, and mediates cisplatinresistance in NSCLC through activating PI3K/AKT pathway [41]. Notably, the protective role of lncRNA SNHG7 was revealed in our study. Urothelial carcinoma-associated 1 (UCA1) was found to promote ge tinib-resistance in NSCLC [42], knockdown of UCA1 can impair cell proliferation and promoted the ge tinib-induced cell apoptosis, which was considered as a promising therapeutic target for the NSCLC patients with EGFR + [42]. So far, no studies have been conducted on the biological function of WWC2-AS2 in NSCLC, which needs a systematic study further.
It is worth noting that there are few studies investigating the relationship between ICGs and ferroptosis currently. We found different expression levels of BTLA, BTN2AA2, CD160, CD226, CD27, CD276, CD40LG, CD96, CTLA4, TIGIT between the two risk groups of patients with NSCLC. Of note, the risk score was also demonstrated correlated with the chemotherapeutic effect. These results indicated that these FRGs-lncRNAs may regulate the development and progression of NSCLC by modulating the immune response and play a crucial role in the drug resistance in NSCLC.
The advantage of this study is that the data of LUAD and LUSC were systematically combined for the analysis and an FRGs-lncRNAs prognostic model of NSCLC was constructed for the rst time and veri ed both in an internal and external cohort. What's more, the FRGs-lncRNAs were comprehensively identi ed both in the TCGA database and LncTarD database. Notably, the association between risk score and chemotherapeutics sensitivity was rst analyzed in the present study. However, a prospective cohort and molecular biology experiment need to be conducted to further verify the accuracy of the prognostic model and the function of these FRGs-lncRNAs due to the lack of experimental veri cation in the present study.

Conclusion
In conclusion, 10 FRGs-lncRNAs associated with the OS of NSCLC were identi ed and a novel FRGs-lncRNAs prognostic model was constructed in the present study and veri ed both in the internal and external cohort. Then, the relationship between immune response, chemotherapeutics effectiveness and risk score were further evaluated. These ndings have potential reference value for guiding the treatment and prognosis evaluation of NSCLC patients.       in low-and high-risk groups. NSCLC, non-small cell lung cancer; lncRNAs: long non-coding RNAs; FRGs, ferroptosis-related genes; FRGs-lncRNAs: FRGs related lncRNAs; ROC, receiver operating characteristic.  The univariate (I) and multivariate (J) Cox regression analysis of the associations between the risk score, clinical parameters and OS in NSCLC patients. NSCLC, non-small cell lung cancer; OS, overall survival.