A Novel E3-ubiquitin Ligase-Related LncRNA Signature for Predicting Prognosis in Patients with Lung Adenocarcinoma


 Background: E3 ubiquitin ligase mRNA plays an important role in mediating tumor microenvironment, and is involved in tumor initiation and progression. However, few studies have realized the value of E3 ubiquitin ligase-related lncRNAs in lung adenocarcinoma (LUAD).Methods: Herein, we comprehensively evaluated the E3-ubiquitination patterns including multiple tumor-related molecular phenotypes in LUAD samples using lncRNA profiling from GEO and TCGA database, identified a survival-related risk signature consisting of E3-ubiquitin ligase-related lncRNAs via LASSO and multivariate stepwise Cox regression analysis. Based on the risk score calculated for each sample, LUAD patients were divided into high- and low-risk groups. The predictive value of the signature in overall survival was explored, and a nomogram integrating the risk signature and clinical characteristics was identified and tested. Results: A risk signature consisting of 7 specific E3-ubiquitin ligase-related lncRNAs was screened, and can be viewed as a reliable independent predictor of prognosis. We performed consensus clustering analysis and successfully identified 4 molecular subtypes significantly linked to the OS of LUAD, which validates the prognostic and predictive value of this signature to some extent. The ssGESA analysis revealed that the high-risk group was bound up closely with epithelial-mesenchymal transition, hypoxia, and PI3K/AKT/mTOR pathways, and had a worse outcome. Moreover, we created a nomogram consisting of pathological staging and risk score. Validation analysis demonstrated high conformity of nomogram predictive probability and actual overall survival in LUAD of TCGA and GEO datasets.Conclusion: The model consisting of specific E3-ubiquitin ligase-related lncRNAs contributes to predicting the prognosis of LUAD patients.


Introduction
Lung adenocarcinoma (LUAD), a most common type of non-small cell lung cancer (NSCLC), accounts for about 40% of all lung cancers (Denisenko et al., 2018). The new therapeutic strategies such as molecular targeted agents and immune checkpoint inhibitors have demonstrated astounding durable e cacy in a small fraction of LUAD patients. However, the majority of patients still experience dismal prognosis, far from a satisfactory clinical need. The average ve-year survival rate of LUAD patients is less than 18%, mainly due to late diagnosis, recurrence, and metastasis (Zhao et al., 2018). In this scenario, identifying promising diagnostic biomarkers and therapeutic targets for LUAD is of great importance.
Ubiquitination is a dynamic and reversible process in which ubiquitin covalently conjugates onto lysine (K) resides of target proteins. The process is catalyzed by sequential actions of ubiquitin-activating (E1), ubiquitin-conjugating (E2), and ubiquitin-ligating (E3) enzymes. E3 ubiquitin ligases modulate the transfer of ubiquitin to substrate proteins determining their fate (Zhou and Sun, 2021; Kwon and Ciechanover, 2017). E3-mediated ubiquitin-proteasome system regulates the activation and repression of the innate and adaptive immune responses, mediates tumor microenvironment, and induces tumor initiation and progression (Budroni and Versteeg, 2021;Liu et al., 2021). For example, Kohli et al. reported that UBE3A, also known as E6-associated protein (E6-AP), was the founding member of the HECT family of E3 ubiquitin ligases. UBE3A-mediated SIRT6 degradation promoted cell proliferation and invasiveness in liver cancer cells (Kohli et al., 2018). E3 ubiquitin ligase ITCH interacts with ASPP2 to suppress Smad7 degradation, and thus negatively regulates TGF-β1-induced epithelial-mesenchymal transition (EMT) and ultimately inhibits gastric cancer invasion (Gen et al., 2017). Ubiquitin-mediated NOTCH1 protein (N1ICD) is abnormally activated via RNF8 degradation leading to increase the activation of Notch signaling, promoting the initiation and progression of breast cancer (Li et al., 2018a). The overexpression of E3 ubiquitin ligase NEDD4 leads to the increase of cell migration in lung cancer through facilitating EGFinduced cathepsin B secretion by lysosomal exocytosis (Shao et al., 2018).
Long non-coding RNA (lncRNA), characterized by more than 200 nucleotides in length, is believed to play an essential role in tumorigenesis and progression (Wu et  mechanism to enhance the proliferation, migration, and invasion of NSCLC cells (Li et al., 2020). Although a growing number of lncRNAs have been annotated, the speci c role of lncRNAs related to E3 ubiquitin ligase in the occurrence and development of cancers remains unclear. In the study, we are the rst to comprehensively describe the biological function of E3-ubiquitin ligase-related LncRNAs in lung cancer, and boost the understanding of the roles and mechanism of action of E3-ubiquitin ligase-related lncRNAs in lung cancer using lncRNA pro ling analysis with the intention of selecting important biomarkers. Moreover, we also identi ed for the rst time the E3-ubiquitin ligase-related LncRNA signature for predicting the prognosis of LUAD patients.

Data Download and Preprocessing
RNA-Seq and corresponding clinical parameters were downloaded from The Cancer Genome Atlas (TCGA) database consisting of 535 tumor and 59 normal samples. Patients with complete follow-up were randomized into training and internal testing groups at a ratio of 6:4. LUAD samples from GSE412717 were also employed as an independent external validation set. A total of 196 genes encoding E3-ubiquitin ligase were obtained from the UbiBrowser, a bioinformatics platform providing a comprehensive and upto-date dataset of proteome-wide human E3-substrate interaction network (Li et al., 2017). The study did not need the ethics committees` approval due to the open-access database including the TCGA or GEO.
The work ow diagram of this study was presented in Figure 1.

Identi cation and Evaluation of Risk Signature
We extracted lncRNAs data (FPKM values) from TCGA and GEO databases respectively and performed the intersection of lncRNAs. Subsequently, we screened E3-ubiquitin ligase-related lncRNAs using Pearson correlation (Cor >0.5 or Cor < -0.5, P <0.001) after excluding genes with FPKM value less than 0.5. After then, to construct a speci c signature composed of E3-ubiquitin ligase-related LncRNAs, the least absolute shrinkage and selection operator (LASSO) Cox regression with an estimation of penalty parameter through performing 10-fold cross-validations and multivariate stepwise Cox regression analysis were employed to recognize the key lncRNAs based on the above-mapped lncRNAs. On the basis of gene expression together with regression coe cients (β genes*expression level of genes), the risk score was computed. According to the median cutoff of the risk score in the training cohort, LUAD patients were dichotomized into high-and low-risk groups. The Kaplan-Meier survival curves and receiver operating characteristic (ROC) curves were performed to evaluate the predictive e cacy and reliability of the signature. Furthermore, together with other clinical variables, the created predictive signature was incorporated into the stepwise multivariate analyses. Relative expression of lncRNAs identi ed between normal and tumor samples was further validated.

Consensus Clustering for Risk Signature
To further validate the predictive performance of the signature above, we performed the cluster analysis to categorize LUAD patients into several subtypes using the R "ConsensusClusterPlus" package. Furthermore, we also calculate the Immune Score and Stromal Score of LUAD samples in the TCGA cohort via ESTIMATE algorithm (Yoshihara et al., 2013). Enrichment analysis was utilized to evaluate the enrichment pathways of differentially expressed genes (DEGs) between low-and high-risk groups via R package "ClusterPro ler" (Yu et al., 2012). A P value of <0.05 was considered as a threshold to judge signi cant categories in GO and KEGG analysis. Differences in top enriched signaling pathways and other tumor-related pathways between distinct risk signature groups from the TCGA cohort were assessed and compared using the GSEA algorithm.

Construction of nomogram
Nomogram integrating with the risk score and prognosis-related clinical parameters like pathological staging was established to illustrate the 1-, 2-and 3-year survival probabilities of I-III stage LUAD patients according to the results of multivariate Cox regression analysis. Calibration plots delineated by bootstrapping method investigated the agreement between the survival probabilities between reality and prediction based on the constructed nomogram. Receiver operating characteristic curves (ROCs) with the area under the curve (AUC) values also were developed to evaluate the predictive ability of the nomogram compared with the TNM staging system.

Statistical Analyses
All statistical analyses were performed using R software (version 3.6.1) unless otherwise indicated.
Univariate and multivariate Cox regression analyses were utilized to evaluate the independent prognostic value of E3-ubiquitin ligase-related lncRNAs in terms of OS. Kaplan-Meier curves were employed to compare the OS between high-and low-risk groups. Mann-Whitney-Wilcoxon test was utilized to compare the distribution of immune cell in ltration between high-and low-risk groups. The prognostic capacity of risk signature and nomogram was evaluated by receiver operating characteristic (ROC) curves (R package "timeROC") and AUC values. In terms of the prognostic predictive accuracy and precision of the nomogram, the calibration plots also were performed. P < 0.05 was considered statistically signi cant unless otherwise stated.

Construction of E3-ubiquitin Ligase-Related Signatures lncRNA Risk Signature
A total of 337 E3-ubiquitin ligase-related lncRNAs were extracted from 14,086 shared lncRNAs in both TCGA and GEO datasets, followed by LASSO regression analysis (Figure 2A  The expression level of 7 lncRNAs was shown in Figure 2D. Patients were divided into high-and low-risk cohorts according to the median risk score of the training group. Meanwhile, the distribution of expression heat map, risk score, and survival status of LUAD patients in the three cohorts were plotted in Supplemental Figure 1 in detail. These genes are most likely to have important effects on LUAD.

Validation of Prognostic Value of Risk Signature
The Kaplan-Meier analysis for each dataset proved that high-risk group had worse survival status compared with the low-risk group (P < 0.05) among patients with LUAD ( Figure 3A-3C). ROC curves suggested that the E3-ubiquitin ligase-related lncRNA signature had good prognostic prediction ability in LUAD patients. The AUC of 2-, 3-, and 5-year was 0.75, 0.75, and 0.76 in the TCGA training cohort, 0.59, 0.63, 0.71 in the TCGA testing cohort, and 0.6, 0.63, 0.63 in the GEO cohort, respectively ( Figure 3D-3F). Forest plot of the multivariate analysis (P <0.01, Figure 3G-3I) indicated that the risk signature was an independent prognostic factor compared to other predictors (age, gender, tumor stage, and smoking status).

Identifying Molecular Subtypes Based on E3-ubiquitin Ligase-related LncRNA Signature
To further investigate the prognostic and predictive value of the above signature, we then performed unsupervised clustering and found 4 stable subtypes in LUAD based on TCGA and GEO data ( Figure 4A-4C and Supplemental Figure S3). Kaplan-Meier analysis demonstrated signi cant survival discrepancies among the four subgroups and cluster D had the worst outcome compared to the other three subgroups (P<0.001, Figure 4D). This analysis further indicated that the 7 lncRNAs related to E3-ubiquitin ligase may be feature genes.

Correlation Assessment Between Immune Risk Signature and Immune Cell In ltration and Functions Pathways
We further explored the potential relationship between the risk signature and the immune response in LUAD. A clear distinction of immune in ltration level between high-and low-risk groups was presented via ssGESA ( Figure 5A). The in ltration level of anti-tumor immunocyte including activated CD4 + /CD8 + T cell and effect memory CD8 + T cell was higher in the low-risk group (P<0.01) than that in the high-risk group.
The levels of intermediate immunocyte subtypes including eosinophil, immature B cell, activated B cell, mast cell and T follicular helper cell were also elevated in the low-risk signature group (P<0.001) ( Figure  5A). There is no signi cant difference between high-and low-risk groups in terms of stromal and immune scores. Top KEGG enriched signaling pathways were depicted in Figure 5B, the enriched signaling pathways involving in the low-risk group were NOD-like receptor signaling pathway, neuroactive ligandreceptor, cytokine-cytokine receptor interaction, cAMP signaling pathway, cell adhesion molecules, and so on. In the high-risk group, there were two mainly enriched signaling pathways --viral carcinogenesis and necroptosis.

Characteristics of Transcriptome Traits in Tumor-related Regulation
To reveal the role of E3-ubiquitin ligase-related risk groups in the tumor-related regulation, we further studied the expression difference of genes associated with the transcripts of some chemokine and cytokine in different regulatory pathways including angiogenesis, apoptosis, epithelial-mesenchymal transition (EMT), hypoxia, and P53 signaling pathway. The ssGSEA analyses for the activity of multiple tumor-related regulations revealed that the high-risk group with worse survival were signi cantly related with EMT pathways, hypoxia pathways, Notch signaling, and PI3K/AKT/mTOR signaling pathways ( Figure 6). Moreover, we compared the expression difference of multiple genes related to the regulation of EMT and hypoxia. The result indicated that these genes relevant to EMT and hypoxia pathway were signi cantly upregulated in the high-risk group.

Construction and Validation of the Nomogram
A nomogram was established to predict the probability of the 1-, 2-, and 3-year overall survival of LUAD with I-III stage in the TCGA cohort ( Figure 7A). In assessing the predicting e cacy at 1-year OS, the AUC for nomogram, tumor stage, and risk score was 0.780, 0.700, 0.691, respectively ( Figure 7B). The 2 years AUC of the nomogram, tumor stage, and risk score was 0.810, 0.749, 0.702, respectively ( Figure 7C). In terms of 3 years AUC, the value of nomogram, tumor stage, and risk score was 0.772, 0.717, 0.679, respectively ( Figure 7D). The calibration curves for the 1-, 2-and 3-year survival revealed excellent agreement in both prediction and observation in training cohorts ( Figure 7E-G). Validation analysis in the GSE412717 cohort showed identical results (Supplemental Figure S3).

Discussion
Here, we systematically for the rst time analyzed the role of E3-ubiquitin ligase-related LncRNAs in LUAD prognostic prediction by bioinformatics and built a reliable prognostic signature based on seven speci c lncRNAs. Further analyses validated the predictive e cacy in the TCGA and GEO datasets. This signature strati ed the patients into high-and low-risk subgroups, characterized by distinct tumor immune in ltration level and tumor-related regulatory phenotypes such as EMT and hypoxia via ssGSEA analyses.
Through multivariate Cox regression analysis, this signature was included in a fusion nomogram combined with the pathological staging to predict the outcome of LUAD with the I-III stage. Taken together, the newly identi ed risk model may serve as a promising tool for prognosis estimation of LUAD patients.
Mounting researchers have discovered that E3-ubiquitin ligase-related lncRNAs could mediate immunity and be involved in multiple tumor-related pathways. Our study evaluated 28 immune cell in ltrations and found that risk score was negatively associated with activated CD4+ T cells and effector memory CD8+ T cell that exerted an antineoplastic effect by releasing TNF-α and IFN-γ (Gabrilovich et al., 2012; Apetoh et al., 2011). Thus, low-risk samples tend to have a favorable prognosis. Meanwhile, the high-risk group presented a tight correlation with the pathways/phenotypes including epithelial-mesenchymal transition (EMT), hypoxia, Notch signaling, and PI3K/AKT/mTOR signaling pathways. E3-ubiquitin ligase-related lncRNAs had been described to regulate tumor progression via EMT. Pan et al. demonstrated that lnc-CTSLP4 could reverse EMT of GC cells by recruiting E3 ubiquitin ligase ZFP91 to decrease NRNPABmediated Snail transcription, thereby inhibiting GC progression (Pan et al., 2021). LINC00858 could sponge miR-134-5p to promote E3 ubiquitin ligase RAD18 elevation, resulting in enhancing EMT of cancer cells (Xue et al., 2020). Besides, the overexpression of lnc-DARS-AS1 was able to interfere with the interaction between RBM39 and E3 ubiquitin ligase RNF147 via hypoxia regulation, leading to myeloma genesis (Tong et al., 2020). In short, E3-ubiquitin ligase-related lncRNAs are inversely associated with the migration and invasion of LUAD.
We identi ed 7 E3-ubiquitin ligase-related lncRNAs included in the risk signature and further developed a nomogram. GABPB1-AS1 was highly expressed in normal tissue and acted as a favorable gene in renal carcinoma (Gao et al., 2020a), and its overexpression could inhibit proliferation, migration, and invasion. A previous study reported that LINC02178 may be a promising prognostic biomarker of cancers like LUAD (Li et al., 2018b). Other lncRNAs were reported for the rst time and are worthy of further investigation.
Nomogram, as a graphical representation tool, has been widely used to integrate a variety of variables especially clinical information and genomics in the evaluation of prognosis (Balachandran et al., 2015). According to the hazard ratio of the various factors, each variable can get a corresponding score, and clinicians will easily calculate the total score and offer the corresponding survival probability to individuals. Currently, researchers have performed nomograms to provide individualized prognosis predictions in multiple cancers like breast cancer, gastric cancer, lymphoma, and lung cancer (Wang et al., 2021;Geng et al., 2021;Gao et al., 2020b;Wu et al., 2021). Herein, we constructed a prognostic nomogram by combining pathological staging with E3-ubiquitin ligase-related lncRNA signature. The result revealed that the accuracy and clinical usefulness of our nomogram for predicting prognosis were excellent compared to the TNM staging system.
There were several limitations in this study. Our study included the TCGA and GEO datasets and all results were derived and validated using them. Given the limited number of patients with an advanced stage in the two datasets, we only established and validated a nomogram for predicting survival probability in stage I-III LUAD patients. Additionally, additional LUAD samples should be employed to estimate the predictive value of lncRNAs. Likewise, the roles of the lncRNAs and their interactions with E3ubiquitin ligase genes should be determined in future researches in vitro and in vivo.

Conclusions
In conclusion, our study identi ed seven lncRNAs related to E3-ubiquitin ligase with stable survival predictive value through performing an integrated analysis of large-scale lncRNA expression pro les from the GEO and TCGA databases. Meanwhile, we developed the corresponding prognostic signature, which would contribute to the understanding of the molecular mechanism in LUAD.       The integrated nomogram consisted of the seven-gene signature and pathological staging for predicting LUAD at 1-, 2-, and 3-year OS in the TCGA database.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.