Development and Validation of a Lncrnas-Based Nomogram for Survival Prediction in Neuroblastoma Patients

Treatment of neuroblastoma is evolving toward precision medicine. LncRNAs can be used as prognostic biomarkers in many types of cancer. Methods Based on the RNA-seq data from GSE49710, we built a lncRNAs-based risk score using the least absolute shrinkage and selection operation (LASSO) regression. Cox regression, receiver operating characteristic curves were used to evaluate the association of the LASSO risk score with overall survival. Nomograms were created and then validated in an external cohort from TARGET database. Gene set enrichment analysis was performed to identify the signicantly changed biological pathways. The 16-lncRNAs-based risk In GSE49710 cohort, the high-risk a poorer OS than those in the low-risk group (P<0.001). Moreover, multivariate Cox regression analysis demonstrated that risk was an independent risk factor (HR=6.201;95%CI:2.536-15.16). The similar prognostic powers of the 16-lncRNAs were also achieved in the external cohort and in stratied analysis. In addition, a nomogram was established and worked well both in the internal validation cohort (C-index=0.831) and external validation cohort (C-index=0.773). The calibration plot indicated the good clinical utility of the nomogram. Gene set enrichment analysis (GSEA) indicated that high-risk group was related with cancer recurrence, metastasis and inammatory associated pathways. probabilities for the nomogram.


Introduction
Neuroblastoma originates from neural stem cells, is one of the most common malignant tumors in children [1]. Risk-based therapeutic methods combined with surgery is an important component of therapies for neuroblastoma, which requires an optimization of overall survival estimation for speci c patient [2]. Indeed, patients can be generally classi ed into low-risk or high-risk groups depending on their tumor MYCN status and International Neuroblastoma Staging System (INSS). However, it does not account for several known prognostic factors (eg, age, sex) and unknown prognostic factors (eg, molecular biomarkers). The conventional prognostic factors such as the INSS stage and MYCN status are insu cient to predict neuroblastoma patient outcome, treatment outcome remains undesirable [3].
Due to the high levels of heterogeneity of neuroblastoma, the prognosis remains quite unsatisfactory [4].
Several studies have attempted to integrate multiple biomarkers into a single model and improve speci city and individual risk assessment [5]. The discovery of new reliable molecular biomarkers will guide the prognostic evaluation of neuroblastoma patients. Therefore, a more accurate molecular biomarker for precise prognostic prediction are needed. Currently, with the development of RNAsequencing (RNA-Seq) and bioinformatics, the roles of dysregulated functional long noncoding RNAs (lncRNAs) in human cancers have received considerable attention [6]. The lncRNAs are non-coding RNAs that range in length from 200 nucleotides to over 100 kilobases, which has been identi ed to serve a critical role in various types of cancer, and some of them have been implicated in diagnosis and prognostication [7]. Although some lncRNAs for neuroblastoma prognosis prediction have been tested in basic medicine trials such as lncRNA SNHG16 [8], lncNB1 [9], FOXD3-AS1 [10], more potential and valuable lncRNAs are urgent to be discovered.Integrating multiple lncRNAs into a lncRNA signature model might create predictive and prognostic value in the management of neuroblastoma.
A nomogram is a graphical analogue computation device that can be used to evaluate the prognostic biomarkers in oncology and medicine. Nomograms makes the prediction model readable by generating an individual probability of a clinical event, leading us closer toward personalized medicine [11]. In this study, we conducted a comprehensive analysis of lncRNAs pro ling of neuroblastoma based on a bioinformatics method. By using the least absolute shrinkage and selection operation (LASSO) algorithm, a novel lncRNAs signature was built to predict the survival outcomes of patients with neuroblastoma. Furthermore, we constructed a nomogram to accurately predict the prognosis for neuroblastoma patients.

Methods
Neuroblastoma datasets and differential expression analysis A owchart which depicted the whole process of our analysis was rstly plotted in Figure 1. Raw microarray data from GSE49710 was directly downloaded from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). GSE49710 was conducted by GPL16876 Agilent-020382 Human Custom Microarray 44k. The inclusion criterias included: patients pathologically diagnosed with neuroblastoma; without other malignant tumor history; complete follow-up information. Patients were excluded if the clinical information was incomplete. Total 498 neuroblastoma samples with complete follow-up information were obtained. The comprehensive gene annotation for lncRNA was downloaded from the GENCODE (release 32). All the long non-coding RNAs were extracted from GPL16876 for the preliminary screening of prognostic lncRNAs. The approach of lncRNA pro le analysis mainly referred to previous study [12]. Finally, 3798 annotated lncRNA transcripts were generated. We used the limma package in R software to identify differentially expressed lncRNAs (P-value < 0.05 and |log 2 Fold-change (FC) |>1) [13].
Construction of the lncRNA-based prognostic signature The LASSO can be used to select optimal features in microarray data with a robust prognostic value [14].
After screening out the differentially expressed lncRNAs, we carried out LASSO regression to select the lncRNAs for prediction of the overall survival. The "glmnet" package in R software was used to perform the LASSO regression analysis. Coe cients from LASSO regression were used to construct prognostic signatures. Then the lncRNA-based LASSO risk score of each sample was calculated according to expression levels of the lncRNAs multiply by LASSO coe cients. Finally, a multivariate Cox regression analysis was conducted to assess the contribution of a lncRNA-based LASSO risk score as an independent prognostic factor for neuroblastoma patient overall survival. The optimal cut-off of LASSO risk score was obtained using "survminer" package (https://CRAN.R-project.org/package=survminer) in R. Then the neuroblastoma patients were divide into the high risk group and low risk group based on the median LASSO risk score value. Kaplan-Meier curves were used to compare the two groups regarding their survival outcomes with the assistance of the log-rank test. Receiver operating characteristic (ROC) curve analysis was employed to assess the performance of survival prediction. A P-value <0.05 was considered signi cant.

External data validation
To further verify the predictive value of the lncRNA-based LASSO risk score, another independent dataset from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database as the validation series. TARGET Neuroblastoma, a gene expression array in TARGET database that contains the expression values for part of the differentially expressed lncRNAs, as well as prognosis information for 190 neuroblastoma cases, was enrolled as a validation set (https://ucscpublic.xenahubs.net). Survival curves were plotted using the Kaplan-Meier method.

Construction of the nomogram
A composite nomogram predicting the overall survival of neuroblastoma was generated by the "rms" package(https://CRAN.R-project.org/package=rms) of R statistical software. The nomogram models were internally validated by use of bootstrap with 200 resamples [15]. The concordance index (C-index) was used to estimate predictive performance of the nomogram. Calibration plots were performed to observed and predicted probabilities for the nomogram.

Gene set enrichment analysis
To elucidate the potential in uence of lncRNAs on the neuroblastoma, an enrichment analysis was conducted between the high-risk and low-risk groups using the Java GSEA implementation (http://www.broadinstitute.org/gsea). Data used for GSEA were accessible from GSE49710. The expression pro le data of 498 neuroblastoma cases were sorted in terms of lncRNA-based LASSO risk score. Standard settings with 1000 runs of gene permutations were employed. A normalized enrichment score (NES) is calculated for each gene set based on the size of the set. Nominal p-value and FDR q-value<0.05 was chosen as the cut-off criteria [16].

Statistical analysis
The distribution of overall survival was estimated by the Kaplan-Meier method and compared by log-rank test. Cox proportional hazards models were used to estimate hazard ratios and 95% con dence intervals (CIs). The P value <0.05 was considered statistically signi cant. All statistical tests were performed with R software (Version 3.6.1).

Identi cation of prognostic lncRNAs
We developed a ow diagram to describe our study ( . We then calculated the 16-lncRNAbased LASSO risk score for each patient in the GSE49710, and ranked them according to their LASSO risk score. The median score (0.0071) of the whole series was adopted as the cut-off value. As such, patients were divided into a high-risk group (n=249) or a low-risk group (n=249). Patients in the high-risk group had signi cantly shorter overall survival than those in the low-risk group (log-rank test P < 0.0001) (Fig3A). ROC curve indicated the LASSO risk score had a comparative ability in predicting overall survival (Fig3B). Further, in the univariate cox regression model, the LASSO risk score was a strong variable correlated with prognosis in validation datasets. After multivariate adjustment by clinical factors, the LASSO risk score and INSS stage remained a powerful and independent prognostic factor (Table2). We strati ed patients into different group based on age (< 18 months or ≥ 18months), INSS stage (I + II or III + IV + IVs) and MYCN status (ampli ed or without ampli ed). These results showed that the 16-lncRNAbased LASSO risk score can still separate neuroblastoma patients into high-risk or low-risk group, and patients in high-risk group have shorter OS and poor prognosis than those in low-risk group (Fig4).

External validation of the LASSO risk score
To further assess the predictive prognostic value of the LASSO risk score, total 190 neuroblastoma patients without adjuvant therapy from the TARGET database (external validation series) were used to validate our results. Patients were separated into high-risk and low-risk group based on the LASSO risk score. Patients with high LASSO risk score had a higher mortality rate than those with low LASSO risk score (Fig5A). ROC curve indicated the LASSO risk score had a comparative ability in predicting overall survival (Fig5B). The results from univariate and multivariate COX regression analyses showed that the identi ed independent risk factors for OS included the LASSO risk score and INSS stage (Table 3).

Nomogram development
To predict the overall survival probability of patients with neuroblastoma using a quantitative method, we assembled a nomogram that integrated both the 16-lncRNA-based LASSO risk score and various clinical

Gene set enrichment analysis
We performed GSEA to identify correlated biological process and signaling pathways using the 16-lncRNA signature on the basis of risk score for classi cation. Signi cant gene sets were selected: E2F targets, In ammatory response, KRAS signal pathway, TNF-α signal pathway. We proposed that the 16-lncRNAs might produce a marked effect by in uencing these process and pathways (Fig8).

Discussion
Neuroblastoma is a common tumor in infants and young children. Neuroblastoma is a group of heterogeneous tumors with different prognosis. Clear clinical risk grouping is important for making treatment plan and evaluating prognosis. Precision medicine is a medical model that customizes treatment plans for patients according to their characteristics in genome, phenotype, and biomarkers [17]. Nomogram is a statistical model that can be used for clinical events predictive analysis. Compared with other predictive statistical methods, nomogram can provide better prognostic risk assessment in a visual way. To date, some nomograms have been constructed to predict the prognosis of patients with cancer.
Base on the cancer genome atlas database, some researchers developed a nomogram that incorporate an autophagy-related signature for predicting the survival of glioblastoma patients [18]. Another previous study has constructed a nomogram that could be used to predict the prognosis of uterine corpus endometrial carcinoma [19]. In an effort to guide risk-based treatment for patients with neuroblastoma, we developed and validated nomograms that were based on combinations of clinical and molecular prognostic markers. The nomogram may help perform risk strati cation and provide more individualized clinical advice for each patient.
Recent studies have shown that LncRNA can regulate many important life activities, and it is also closely related to the occurrence and development of tumor [20]. However, the prognostic value of lncRNAs in neuroblastoma is not entirely clear. In this research, we constructed and validated a 16-lncRNA-based LASSO risk score to predict overall survival for patients with neuroblastoma. Patients in the low-risk group had signi cantly better survival than those in the high-risk group. Our research highlighted the prognostic value of the 16-lncRNAs and suggested practical applications in prognostic prediction of neuroblastoma. The lncRNA FOXD1-AS1 functions as an important oncogenic lncRNA in glioma and affected biological processes via protein eIF5a [21].In breast cancer, higher lncRNA ZFHX4-AS1 expression is associated with worse prognosis, and lncRNA ZFHX4-AS1 acts as an oncogene through the Hippo signaling pathway [22].In non-small cell lung cancer, lncRNA PCAT7 can promote cell proliferation and induce epithelial-mesenchymal transition, it has been well documented as a key regulator of tumour growth and development [23]. The biological function of the remaining lncRNAs in our study has not been investigated in previous studies. Further researches are required to explore the underlying molecular mechanisms of these lncRNAs. These data might improve the development of a cheap molecular test. In our study, strati ed analysis were performed to further evaluate whether the 16-lncRNA-based LASSO risk score exhibit predictive effect within same clinical characteristics. We strati ed patients into different group based on age, INSS stage and MYCN status. These results showed that the 16-lncRNA-based LASSO risk score can still separate neuroblastoma patients into high-risk or low-risk group, and patients in high-risk group have shorter overall survival and poor prognosis than those in low-risk group. These results demonstrated that this 16-lncRNA-based LASSO risk score was independent risk factors for survival prediction of neuroblastoma patients and could stratify patients from different group into subtypes with different prognosis. Thus, the ability of our 16-lncRNAs in identifying subgroups of neuroblastoma patients implies that the 16-lncRNAs may be used to re ning the current prognostic model and promoting further strati cation of patients in the future clinical research.
Most of these dysregulated lncRNAs are not yet functionally annotated. However, we can infer the underlying regulatory function of the lncRNAs using the mRNA expression data of the same group of patients. In order to identify a group of pathways signi cantly enriched in high-risk group with respect to low-risk group, we performed gene set enrichment analysis (GSEA) using the Java GSEA implementation. The neuroblastoma patients in GSE49710 were assigned into two groups (high risk vs. low risk) according to the optimal cut-off of LASSO-risk-score. Here, the 16-lncRNAs identi ed in our study are involved in biological pathway known to be crucial to the relapse of neuroblastoma, namely KRAS signalling pathway. A study involving 32 samples have reported that KRAS mutations have been found in some cases of relapse neuroblastomas [24]. E2F-mediated associated pathways were signi cantly associated with the recurrence of cancer. Abnormal copy number of G1-cell cycle related genes is frequently found in neuroblastoma and increased expression of E2F target genes [25]. Metastasis is a hallmark of malignant neuroblastoma and is the main reason for therapeutic failure. TNF-α is a proin ammatory cytokine that activates the nuclear factor-κB signaling pathway, leading to CXCR4 overexpression, fostering neuroblastoma cell metastasis [26]. Thus, it is a plausible inference for the 16-lncRNAs associated with survival of patients with neuroblastoma. The potential molecular function may put forward the direction for the further study on the mechanism of neuroblastoma progression.
There are some limitations in our study. We cannot exclude the possibility of residual confounding after internal validation as a result of possible over tting from variable and threshold selection for these models. So, internal validation with bootstrapping and external validation were used to address these concerns. In addition, the underlying mechanisms of these lncRNAs in our risk score remain unclear. Further cell and animal studies are needed to con rm the exact molecular mechanisms of these lncRNAs.
In conclusion, this study identi ed a novel and robust lncRNA-based LASSO risk score for prognostic prediction of neuroblastoma by mining currently available microarray data. This risk score may contribute to personalize prediction of neuroblastoma prognosis and acted as potential biomarkers. A nomogram combining the molecular signature and clinical factors was constructed. The nomogram that contain the 16-lncRNAs allows for accurate risk assessment and guides future clinical planning regarding patient surveillance.