Molecular Subtyping and Prognostic Prediction Based on the Ferroptosis-Related Long Non-Coding RNA Signature in Clear Cell Renal Cell Carcinoma

Background Renal cell carcinoma is gradually characteristic of the accumulation of lipid-reactive oxygen species. And, the ferroptosis-related LncRNAs have been reported strongly correlated with tumorigenesis and progression. However, the further functions of ferroptosis-related LncRNAs in clear cell renal cell carcinoma have not been explored. Methods After sample cleaning, data integration, and batch effect removal, we used the Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) public database to screen out the expression and prognostic value of ferroptosis-related LncRNAs and then performed the molecular subtyping. The molecular subtyping was developed using the K-means. Then, the KEGG pathway enrichment and statistical analyses of the immune microenvironment and clusters were carried out. The outcomes indicated signicant differences in the immune microenvironment and potential therapeutic targets between the two clusters. Then, LASSO Cox regression was performed to establish the 5-LncRNAs-based prognostic signature. The signature could accurately predict patient prognosis and be used as an independent clinical risk factor. And the 5 LncRNAs were validated in the cell lines and clinical samples. We then combined signicant clinical parameters in multivariate Cox regression and prognostic signature to construct a clinical predictive nomogram, which provides appropriate guidance for predicting the overall survival of ccRCC patients. integrating the prognostic signature and signicant clinical parameters. The results showed that a good predictive performance to the overall survival (OS) of ccRCC patients.


Introduction
Renal cell carcinoma (RCC) is one of the most common malignant tumors of the urinary system (1). As the major pathologic subtype of RCC, clear cell renal cell carcinoma (ccRCC), especially metastatic ccRCC, often means high morbidity and mortality (2). To our best knowledge, almost 25%-30% of ccRCC patients are found metastasis at initial diagnosis, which usually means poor prognosis. Moreover, the tumor node metastasis (TNM) staging system, currently applied in clinical practice, is considered less accurate in evaluating the prognosis and progression of ccRCC patients (3). Meanwhile, 73%-75% of identi ed ccRCC driver aberrations were subclonal, which might contribute to different clinical outcomes (4,5). Hence, performing suitable molecular subtyping and exploring new prognostic signatures to diagnose and evaluate the prognosis of the ccRCC patients remains signi cant.
During the past decades, ferroptosis has been gradually identi ed as an iron-dependent, nonapoptotic cell death mode characterized by the accumulation of lipid reactive oxygen species (6, 7). Increasingly studies have shown that dysregulation of ferroptosis-related genes (FRGs) plays important role in the occurrence and development of many diseases, especially cancer(8- 10). The studies related to ferroptosis have become the focus in treating and detecting related diseases. Meanwhile, as the noncoding RNA plays an emerging role in cancer, long non-coding RNAs (LncRNAs) have been a hot topic of research (11). Interestingly, increasing studies have demonstrated that LncRNAs could participate in the ferroptosis process and then in uence tumor development (12)(13)(14). Therefore, probing ferroptosisassociated LncRNAs might provide new ideas and sights for ccRCC treatment and prediction.
In this study, we rst explored the potential biological functions and associations of ferroptosis-related genes. Afterward, we screened the prognostic ferroptosis-related LncRNAs and conducted the molecular subtyping in the TCGA database. The correlation between the molecular cluster and immune in ltration was explored. Afterward, we conducted the LASSO Cox regression to establish a 5-ferroptosis-related LncRNAs signature in ccRCC patients and validated it in the ICGC database. Further, the nomogram was conducted integrating the prognostic signature and signi cant clinical parameters. The results showed that a good predictive performance to the overall survival (OS) of ccRCC patients.

Data selection and processing
The ccRCC sequencing data (HTseq-FPKM) and the latest corresponding clinical information Total RNA of 10 paired clinical samples and six cell lines were extracted by Takara kit according to the manufacturer's protocol. Then, the RNA was reversed to cDNA in a 20ul reaction system. The quantitation of all gene transcripts was done by qPCR using SYBR Premix ExTaq kit, and TUBA was used as a normalized control. The primer sequences were listed in the supplementary Table 3. Each reaction was performed four times, and the 2^-△△CT method was used to calculate the relative mRNA expression level.

Identi cation of the prognostic ferroptosis-related differentially expressed LncRNAs
According to the former studies (15)(16)(17), we obtained 259 FRGs, and the list was shown (Supplementary Table 4). After that, we screened the ferroptosis-related LncRNAs with the lter (Correlation >0.5, p-value < 0.01), including 2854 LncRNAs. The limma package was conducted to determine the differentially expressed LncRNAs (DELncRNAs) between the ccRCC patients and normal controls, including 1333 LncRNAs(18). We intersected the DELncRNAs and ferroptosis-related LncRNAs to obtain the ferroptosisrelated DELncRNAs. Univariate Cox regression was performed in the ferroptosis-related DELncRNAs with overall survival time. Those with p-value < 0.01 were considered prognostic ferroptosis-related DELncRNAs.

Molecular subtyping in ccRCC based on the prognostic ferroptosis-related DELncRNAs
After obtaining the prognostic ferroptosis-related DELncRNAs (FRDELncRNAs), we performed consensus clustering to identify the molecular subtypes of ccRCC by using the "ConsensusClusterplus" R package (19). We selected 80% of the prognostic ferroptosis-related DELncRNAs resampling 100 times and determined clusterings of speci ed cluster counts (k). Following this, the pairwise consensus values were calculated and stored in a symmetrical consensus matrix for each k. The k, at which there is no appreciable increase, was determined by the cumulative distribution function (CDF) plot and delta area plot. The alteration of Immune in ltration between different clusters was estimated using the Cibersoft method.

Potential biological functions enrichment
To gain insights into the cellular functions directly regulated by FRGs transcriptional control, we compared the list of FRGs to the biological pathways annotated by the Kyoto Encyclopedia of Genes and Genomes (KEGG) (20). Afterward, according to the two clusters, Gene Set Variation Analysis (GSVA) was conducted using the package of the same name in R software v.4.0.3 to investigate the enrichment of HALLMARK pathways with the h.all.v7.4.symbols.gmt gene set from the Molecular Signature Database (21).

Potential therapeutic targets analysis
Since targeted drugs are commonly used to treat advanced kidney cancer, we used the R package "pRRophetic" to estimate drug response as determined by the half-maximal inhibitory concentration (IC50) for each kidney cancer patient on the Genomics of Drug Sensitivity in Cancer (GDSC) website (22).
Further, based on the TCIA database, the Immunophenoscore (IPS) was calculated depending on immune therapy data (23).
Construction of the prognostic predictive risk signature Firstly, the TCGA cohort patients were randomly divided into the training set and internal validation set. Meanwhile, the patient in the ICGC cohort was used as the external validation cohort. Based on the prognostic ferroptosis-related DELncRNAs, we constructed the least absolute shrinkage and selection operator (LASSO) Cox regression using the "glmnet" R package. We calculated each patient's riskscore using the regression coe cient score of the individual LncRNAs and their expression value. Besides, we de ned the formula for calculating the prognostic risk score as follow: Risk score = coef(Lnc1)*Exp(Lnc1) + coef(Lnc2)* Exp (Lnc2) + … + coef(Lncn)* Exp (Lncn). Where "coef" represented the coe cient score estimated by LASSO Cox regression, and "Exp" represented the expression value of the individual LncRNAs. The detailed information from the signature was shown (Table 1). Then, we classi ed the ccRCC patients into the high-and low-risk groups, according to the median risk score of the training group as the cut-off (24).

Validation of the prognostic risk signature
We conducted the Kaplan-Meir and receiver operating characteristic (ROC) curve analyses to assess the prognostic risk signature's validity. According to the calculated median risk score, all samples in each group were divided into high-and low-risk groups, and the survminer and timeROC packages were performed to validate the predictive accuracy in the training and validation sets. The area under the curve (AUC) values corresponding to 1-, 3-, and 5-years were calculated. The time-dependent ROC curve was used to validate the predictive performance of the signature. The AUC value of 0.75 or higher was considered the signi cant predictive value, and the value of 0.60 or higher was regarded as acceptable for prediction. Furthermore, univariate and multivariate Cox regression was conducted to explore if the ferroptosis-signature(FerroSig) could serve as the independent factors.

Construction and validation of the nomogram
To better predict the prognosis of patients with renal clear cell carcinoma, we established the predictive nomogram based on clinical parameters and prognostic signature (25). Brie y, we rst performed univariate and multivariate Cox regression analyses to identify clinical parameters and riskscore that could be used as independent risk factors. Subsequently, the signi cant factors were used to construct the predicted nomogram. We then evaluated the nomogram effect using calibration curves and timedependent ROC curves. The AUC value of 0.75 or higher was considered a signi cant predictive value, and the value of 0.60 or higher was regarded as acceptable for prediction.

Results
Identi cation of the prognostic FRDELncRNAs The data processing was performed as described in the methods above. Meanwhile, the ow chart of the whole process was shown (Fig 1). To explore the potential functions of the FRGs, we rst conducted the KEGG pathway enrichment analysis. The outcome showed that FRGs were mainly enriched in ferroptosis and autophagy pathways (Fig 2A). Afterward, we explored the FRLncRNAs with the correlation > 0.5 and p-value < 0.01, using the limma package. Following this, 1333 DELncRNAs between the ccRCC and normal samples from the TCGA set were screened, and the 5 most obvious up-and down-regulated LncRNAs were labeled ( Fig 2B). The overlapped LncRNAs in both DELncRNAs and FRLncRNAs were identi ed as the FRDELncRNAs. There were 723 LncRNAs for OS selected as prognostic FRDELncRNAs for subsequent analyses (Fig 2C).

Molecular subtyping showed differences in therapeutic choice and immune microenvironments
The prognostic FRDELncRNAs above were used to screen the molecular subtypes of ccRCC by using ConsensusClusterPlus packages. K-means method was performed for clustering, and 80% of the LncRNAs were sampled 100 times using the resampling model. The consensus CDF and delta area were calculated to determine the clustering outcomes, as shown in Fig 3B. When cluster number was 2, there was no signi cant increase in the area under the CDF curve. Hence, we nally divided the samples in the TCGA cohort into cluster 1 and cluster 2. The representative consensus matrix of two clusters ( Fig 3A) displayed a well-de ned 2-block structure. Principal component analysis (PCA) showed that the samples from two clusters could be well separated ( Fig 3C). Afterward, the heatmap integrating the expression of the prognostic FRDELncRNAs and clinical parameters in each subtype were shown ( Fig 3D). The results showed that the pM stage was higher in cluster1 than in cluster, which also clari ed that the patients in cluster 1 had a worse prognosis.
Comprehensively analyses of the molecular clusters Subsequently, we explored the HALLMARK pathway enrichments alteration between the two clusters using the GSVA method. The results showed that several oncogenic pathways were signi cantly altered between the two clusters, such as Hypoxia and Apoptosis (Fig 4A). Further, using the pRRophetic packages, we calculated the IC50 between the two clusters for clinical drugs commonly used in ccRCC patients. The outcome indicated that the patients in cluster 1 were more sensitive to Daunorubicin and Tipifarnib. In contrast, using the Dasatinib, Paclitaxel, Sorafenib, and Pazopanib would be more effective for patients in cluster 2 (Fig 4B).
Since the tumor microenvironment (TME) plays a vital role in the development of tumors, we further explored the correlation between the TME and molecular clusters. As shown in Fig 5A, we rst calculated the stromal score, immune score, and ESTIMATE score. Only the immune score showed a signi cant difference between the two clusters. We then explored the differentially expressed immune cells using the Cibersoft method. Cluster 1 exhibited a higher in ltration of CD8 T cells, regulatory CD4 T cells, and Neutrophils. In comparison, cluster 2 showed a higher in ltration of macrophages ( Fig 5B). Meanwhile, the expression level of some potential immunotherapy targets changed signi cantly in both classi cations. Potential immune therapeutic targets such as BRAF and PD-1 expression levels were markedly higher in cluster 1 (Fig 5C). Next, we explored the TCIA database for differences in the presence of immunotherapeutic targets. The results showed Cluster 1 patients were more effective when treated with CTLA4, whereas cluster 2 patients were more effective when receiving the combination of CTLA4 and PD1. (Fig 5D).

Construction and validation of the prognostic signatures based on the prognostic FRDELncRNAs
Based on the prognostic FRDELncRNAs obtained from the univariate Cox regression analysis, we constructed a 9-FRDELncRNAs based prognostic signature using the LASSO regression (Fig 6A and  6B). The detailed information of the LncRNAs from the signature was shown in Table 1. The risk score is constructed according to the following formula: risk score = 0.058 (LINC00460) + 0.088 (LINC00894) + 0.118 (VPS9D1-AS1) + 0.013 (CYTOR) + 0.075 (FOXD2-AS1). Then, based on the calculated median risk score cut-off, patients were divided into the high-and low-risk group. The risk score distribution, survival status, and expression of LncRNAs from the signatures are exhibited in the TCGA training cohort, TCGA validation set, and ICGC validation set (Fig 6C-E). The Kaplan-Meier log-rank test and the time-dependent ROC curve were used to evaluate the predictive ability and accuracy of the prognostic signature. The outcome of the Kaplan-Meier log-rank test showed that the high-risk group had a signi cantly worse OS compared with the low-risk group in the TCGA training set (Fig 7A), TCGA validation set (Fig 7B), and ICGC validation set ( Fig 7C). Meanwhile, the time-dependent ROC curve proved the 1-year, 3-year, and 5year predictive accuracy of the signature for OS (Fig 7A-7C).

Construction and validation of the prognostic nomogram
After establishing and validating the signatures based on the prognostic FRDELncRNAs, univariate and multivariate Cox regression analyses were used to explore independent risk factors in the TCGA dataset. As shown in Table 2, univariate Cox regression analysis showed that AJCC stage, ISUP grade, Age, and RiskSig were signi cantly correlated with OS (Table 2). Meanwhile, the multivariate Cox regression analyses of the clinical parameters above, AJCC stage, ISUP grade, Age, and RiskSig, were signi cantly correlated with OS (Table 2).
To better assess patient prognosis and guide clinical decision-making, we established the nomogram integrating the risk signature and signi cant clinical parameters in the multivariate Cox regression analyses (Fig 8A). The C-index showed a good agreement of 0.773, and an established nomogram was shown (Fig 8B). The calibration curves showed that the predictive nomogram could well predict the survival status of patients at 1, 3, and 5 years (Fig 8C). Also, the nonogram showed better predictive value than clinical indicators and ferroptosis-related signatures (Fig 8D).

Validation of the LncRNAs in cell lines and clinical specimens.
We nally validated the 5 LncRNAs in signature in ccRCC clinical samples and RCC cell lines. Compared to the normal cell line HK-2, the expression levels of LncRNAs in RCC cell lines were inconsistent, with higher or lower levels present (Fig 9A). The results may be due to the inability of a single cell line to mimic the overall situation of RCC and adjacent normal tissue. We then examined the expression levels of the corresponding mRNAs in 10 paired clinical samples, in which LNC000894, VPS9D-AS1, CYTOR were highly expressed in cancer, while LNC000460 and FOXD2-AS1 were not evident (Fig 9B).

Discussion
Clear cell Renal Cell Carcinoma is a molecularly heterogeneous tumor characterized as radiotherapy and chemotherapy-resistant (2,5). With the development of diagnostic and therapeutic techniques, the 5-year survival rate of ccRCC patients has been signi cantly improved. However, 25%-30% of ccRCC patients were found metastases at initial diagnosis(4). The 5-year survival rate is merely 10%. Most ccRCC patients had no apparent symptoms such as pain and hematuria, usually resulting in diagnostic di culty in the early stage. Moreover, the current TNM staging system used in clinical practice always lacked accuracy for prognostic evaluation. Early diagnosis and accurate assessment of ccRCC patients remains challenging for the reasons mentioned above. Therefore, it is still vital to identify new clinical and molecular biomarkers.
Ferroptosis processes were involved in essential roles in the progression and tumorigenesis of RCC(26). Of note, the expression levels of various iron-related genes were signi cantly correlated with ccRCC patient prognosis, suggesting that targeting ferroptosis could be an effective option for ccRCC treatment (27,28). Moreover, LncRNAs played important roles in regulating the expression of ferroptosisrelated genes and the process of ferroptosis (29,30). Hence, in this study, we comprehensively analyzed the expression and prognosis of ferroptosis-related LncRNAs. We subsequently molecularly typed and developed a prognostic model based on nine LncRNAs in patients with ccRCC, using the method above.
Although many molecular subtyping of ccRCC based on gene expression has been proposed in recent years, the cluster of LncRNAs associated with ferroptosis has not been fully explored (31). Therefore, we divided ccRCC patients into two clusters based on prognostic DEFRLncRNAs using the NMF algorithm. PCA analysis showed signi cant differences between the two clusters. The K-M plot showed that cluster1 had a worse prognosis than cluster2. Since the immune components of the tumor microenvironment and immune cells signi cantly regulate tumor development (32). We further compared the differences in immune cell in ltration between the two clusters. We found that the CD8 + T cell, T cell regulatory, T cells follicular helper, and B cells memory were presented high expression levels in cluster 1. In contrast, the expression levels of neutrophils and macrophages were signi cantly increased in cluster 2. Unlike most cancer types, previous studies have shown that the density of CD8 T-cells correlates with poor prognosis in patients with RCC (33,34). This also explains why patients in cluster 1 have a worse prognosis. Moreover, the in ltration of mesenchymal cells and neutrophils may serve as a protective factor for RCC. We then compared the variations in some of the potential immune therapeutic targets. The results the patients in cluster 1 might be more sensitive to the immune therapy. These results imply molecular subtyping for individualized treatment of patients.
Afterward, we established the predictive signature using LASSO regression. K-M plot and time-dependent ROC curves showed that the predictive signature exhibited good predictive performance. Furthermore, most of the LncRNAs in our signature had been reported in various cancer types. LINC00460 has been extensively studied in cancer and shown in several studies to be a prognostic target in renal clear cell carcinoma (35)(36)(37). LINC00894 was reported to promote breast cancer metastasis by regulating ZEB1(38). CYTOR and VPS9D1-AS1 were associated with multiple cancer prognoses and could regulate the progression of multiple cancers through sponge miRNAs (39)(40)(41). Similarly, FOXD2-AS1 was strongly associated with the prognosis and progression of cancer patients in various cancers(42, 43). These validation results in multiple datasets and literature mining results, indicating that the prognostic signature predicts the prognosis of ccRCC patients and may function as the regulator of ccRCC progression.
Indeed, the whole article has several certain limitations. First of all, our data are based on TCGA and GEO databases, and more independent datasets are needed for testing and validation. Secondly, some LncRNAs in our signature play an essential role in cancer and need to be validated in future experiments. In conclusion, we rst systematically analyzed the expression and prognostic value of ferroptosis-related LncRNAs and assessed immune in ltration and potential prognostic targets by molecular subtyping of ccRCC patients.

Conclusion
In conclusion, our study identi ed the FRDELncRNAs and successfully constructed an individualized ccRCC signature (riskScore), which proved to be signi cantly correlated with OS in both the training and validation cohorts. We also estimated the potential relationship between immune cell in ltration, immunotherapy-related targets, and potential therapeutic drugs between molecular subtyping clusters. Our research was anticipated to provide new insights into ferroptosis-related LncRNAs for future work. Public databases analyzed in this study could be found here: TCGA (https://portal.gdc.cancer.gov/) and GEO database (https://www.ncbi.nlm.nih.gov/geo/). The authors will provide the original data supporting the conclusions of this paper without undue reservation.

Competing interests
The authors declare that there are no con icts of interest.

Funding
This study was supported by grants from the National Natural Science Foundation of China [81670617,81672546,81602253,81772703,81872083], Natural Science Foundation of Beijing [7152146, 7172219], and Wuxi "Taihu Talents Program" Medical and Health High-level Talents Project. All authors would like to thank all contributors to the TCGA and GEO projects. Moreover, all authors thank all researchers at the Institution of Urology, Peking University.
Authors' contributions L Zhou, J Lin, X Li, S He, Y Gong, and Z Zhu designed and supervised the study, drafted the manuscript, and interpreted the data. L Zhou, J Lin, and X Li provided the funding. S He and Y Gong provided advice and supervision. Z Zhu, X Ji, W Zhu, C Xu, T Cai, C Huang performed the experiments, obtained, and analyzed the data.  Figure 1 Flow chart of the whole analysis processes of this study.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.