Bioinformatics analysis combined with experiments to explore neuroblastoma prognostic indicators based on immune-related genes.

Background ： Due to the extremely high mortality rate of children with high-risk Neuroblastoma (NB), there is an urgent need for new indicators to further classify children in the high-risk group for more precise treatment. The purpose of our research is to explore the immune-related genes in NB in the high-risk group, and to further identify and develop a prognostic nomogram based on immune IRG signatures. Methods ： Through bioinformatics analysis to explore the abnormal expression of immune-related genes in the high-risk group. Cox regression and the least absolute shrinkage and selection operator (LASSO) analysis were conducted to identify the immune and overall survival (OS) related mRNA. The accuracy of the risk score is evaluated by Kaplan-Meier method and receiver operating characteristics (ROC) analysis, which is used to build a nomogram in combination with other clinical characteristics.. Quantitative real-time polymerase chain reaction (qRT-PCR) was conducted to detect the accuracy of our results. Results ： A total of 127 common differentially expressed immune genes were found between the high-risk group and the non-high-risk group of the two data sets. Four immune-related genes (IRG) related to prognosis were identified and a risk score was established. Kaplan–Meier survival analysis and time-dependent ROC analysis showed that the 4-IRG risk score has satisfactory predictive potential and achieved consistency in the verification of external data sets. Subsequently, the risk score combined with clinical characteristics draws a nomogram. The reliability of the results was verified on 29 cases of NB tissues by qRT-PCR. patients into low- and high-risk groups with poor prognosis, and draw a nomogram for children in the high-risk group. This feature can help select high-risk patients who need more aggressive adjuvant target therapy or immunotherapy.


Background
Neuroblastoma (NB) is the most common extracranial solid tumor in children, accounting for 7% of all pediatric tumors in patients under 15 years of age, and 15% of all pediatric deaths caused by cancer. The world mortality rate is 0.85-1.1 per 100,000 children under 15 years of age [1]. At present, the overall survival rate (OS) of low-and medium-risk NB is above 90% [2,3]. Despite the use of multiple treatments, the long-term survival rate of high-risk NB is less than 50% [4][5][6]. This may be due to the heterogeneity of NB [7][8][9].
The immune system plays an important role in tumorigenesis, tumor development and metastasis [10].In the past few years, the clinical application of immunotherapy in oncology has received considerable attention. So far, Anti-GD2 immunotherapy can significantly improve the event free survival (EFS) and overall survival (OS) of high-risk NB [11,12]. CAR T Cell Therapy has also made some progress in the experimental phase [13]. Most previous studies have focused on the function of proteins in this process, but there are still relatively few further studies on the specific functions of RNA.
So far, Chen et al. have initially identified the differentially expressed genes between neuroblastoma tissue and control cell lines [14]. Wang et al. studied the prognostic significance of MYCN related genes in NB [15]. Zhong et al. did a prognostic analysis on the differentially expressed genes in stage 4 NB tissue [16]. It has not been found that there are immune-related mRNAs in the prognosis of high-risk NB. Therefore, it is important to confirm the completeness of the mRNA involved in regulating the immune response. In the immune system, mRNA is of great significance not only in providing reasonable immunotherapy methods, but also in providing accurate treatment options for tumors.
In the current study, we have explored immune-related genes from TARGET and GEO databases. In addition, we found that 4-IRG can predict survival and be used as a risk score. Kaplan-Meier survival analysis and time-dependent ROC analysis showed satisfactory prediction potential of the risk score, which was verified in the external data set. Next, the comparison of risk score and clinical factors confirmed the superiority of risk score. We combined the risk score with clinical characteristics to draw a nomogram for predicting the prognosis of high-risk children. Finally, the reliability of the results was verified on 29 cases of neuroblastoma tissues by qRT-PCR.

Gene expression and clinical data acquisition
We obtained 209 neuroblastoma gene expression data (excluding 40 samples without clear pathological diagnosis) from TARGET (https://ocg.cancer.gov/) that contained clinical information for the establishment of the model. The gene expression matrix files and clinical information of the GSE49710 and GSE73517 datasets obtained from the Gene Expression Comprehensive Library (GEO) were used for the screening of differential genes, which contained 498 and 50 samples, respectively. A dataset of 149 cases containing clinical information for external verification was downloaded from UCSC xena (http://xena.ucsc.edu/).

Identification and validation of the prognostic IRGs score
Univariate Cox analysis was used to screen DE-IRG (P<0.05) that was significantly related to overall survival (OS) in the TARGET neuroblastoma dataset. Univariate analysis results showed that meaningful genes undergo least absolute shrinkage and selection operator (LASSO) logistic regression. Subsequently, genes with non-zero coefficients were selected for multivariate Cox analysis to identify independent prognostic genes. Use these independent predictive genes (P<0.05) to further calculate the IRGs score of OS: IRG score=β1* X1 + β1* X1 +…+βn* Xn (β: coefficient derived from multiple regression; X: gene expression value ). The median risk score was selected as the cutoff value of the TARGET neuroblastoma dataset, which was also used for validation in the UCSC dataset. Using Kaplan-Meier analysis, the area under the curve (AUC) of the receptor operating characteristic (ROC) curve was used to evaluate the performance of the prognostic gene signature.

Gene set enrichment and pathway analysis (GSEA)
To illustrate the biological functions of the prognostic genes in high-risk and low-risk patient groups, GSEA was performed in java GSEA (version 3.0) based on the Molecular Signatures Database version 6.2 [18]. With the 209 NB samples in TARGET dataset, KEGG pathways, biological processes, cellular components, molecular functions associated with high-risk and low-risk groups were identified by using C2 (curated gene sets), C5 (GO gene sets). FDR q value < 0.05, |NES|> 1 were considered statistically significant.

Development and validation of the nomogram
The predictive factors for making a nomogram include age, stage, gender, MYCN status and IRGs scores. Concordance index and calibration curves are used to evaluate the discrimination and accuracy of the nomogram.

Validation of DKK1、HTR1A、PLXAN1 and ULBP2 expression
Use univariate cox regression to analyze DKK1, HTR1A, PLXAN1 and ULBP2 and plot their optimal cut-off value, median value, Kaplan-Meier survival analysis and ROC curve grouped by quartile values. Cbioportal (http://www.cbioportal.org/) is a web server that can analyze the relationship between DKK1, HTR1A, PLXAN1, ULBP2 and COG risk. Collect NB samples and perform quantitative real-time polymerase chain reaction (qRT-PCR) verification. After obtaining approval from the Ethics Committee of Children's Hospital of Chongqing Medical University, we collected tumor tissues from 20 high-risk children and 9 non-high-risk children. All patients were diagnosed with primary neuroblastoma and did not receive any treatment before surgery. All tumor tissues were quickly frozen in liquid nitrogen after being isolated and stored at −80°C. Total RNA was extracted from the sample using TRIzol reagent (Servicebio，Wuhan， China), and then reverse transcription and PCR reactions were performed using ReverTra Ace qPCR RT-PCR kit (Servicebio，Wuhan，China). Total RNA was extracted from the sample using TRIzol reagent (Servicebio，Wuhan，China), and then reverse transcription and PCR reactions were performed using Servicebio®RT First Strand cDNA Synthesis kit(Servicebio， Wuhan， China) and 2×SYBR Green qPCR Master Mix(Servicebio，Wuhan，China). The sequences of the applied primers are listed in Supplementary Table S1. Then all steps were performed according to the manufacturer's instructions. Finally, the result was standardized to GAPDH and calculated by the 2 −ΔΔCt method.

Statistical analysis
The differentially expressed genes between high-risk and not high-risk tissues were analyzed by R software package "limma". Cox correlation analysis was used to confirm the relationships between the selected 4-lncRNA and patient outcomes. Survival curves were drawn with the survival package for R. P < 0.05 was regarded to indicate statistically significant differences.

Screening of differentially expressed IRGs on neuroblastoma
The entire work flow of the research is shown in Figure 1. By comparing the expression profiles of the neuroblastoma high-risk group and non-high-risk group in the GSE49710 and GSE73517 data sets, 1569 and 1382 genes were determined to be DEG through volcano graph analysis (logFC≥1, FDR≤0.05). 2A, B). A total of 1812 immune-related genes (IRG) were downloaded from the Immport database. The 127 candidate genes defined as DE-IRGs overlapped between DEGs and IRGs and were visualized by a Venn diagram ( Figure 2C). Finally, through the TARGET data set and UCSC data set, 118 DE-IRGs were determined to be used for prognostic identification.

Functional enrichment and PPI network analysis of DE-IRGs.
The functions of 127 DE-IRGs were found by analysis of KEGG and GO enrichment pathways ( Figure 2D, E). Through KEGG analysis, DE-IRGs significantly enriched the biological processes related to Cytokine-cytokine receptor interaction and Hematopoietic cell lineage. GO analysis showed that DE-IRGs enriched plasma membrane, immune response and protein binding. A PPI network of the 127 DE-IRGs was established, where 125 nodes and 699 interactions were constructed, to identify the interactions between genes (Figure 3a). The top 30 candidate genes were identified to be significantly involved in the network ( Figure 3B). Module analysis recognized related clustering modules in the PPI network ( Figure 3D). With the DE-IRGs clusters, GO analysis was applied for functional enrichment. The results from the PPI network and pathway analysis suggested the plasma membrane, immune response, and Chemokine signaling pathway, were densely connected and enriched in the DE-IRGs( Figure 3C,E).

Identification of DKK1、HTR1A、PLXNC1 and ULBP2 as the independent prognostic DE-IRGs
The TARGET neuroblastoma dataset (training) and the UCSC dataset (validation) were used to identify survival-related genes after 118 candidate DE-IRGs were identified. Supplementary Table 2 summarizes the clinical characteristics of these two datasets. Univariate Cox analysis was used to analyze 118 DE-IRGs in the TARGET neuroblastoma data set. Among them, 33 DE-IRGs were significantly correlated with patient survival (P <0.05) (Supplementary Table 3). Next, through LASSO penalty regression to avoid the collinearity of multiple variables, 11 (λ min=0.02614) and 5 (λ 1se=0.07273) DE-IRG survival models were obtained ( Figure 4A). The coefficient of each gene in the TARGET neuroblastoma data set is shown in the figure ( Figure 4B). ROC curve and box plot are used to evaluate the pros and cons of the two models ( Figure 4C, D). Eleven DE-IRGs were selected, and multivariate Cox regression was performed to find out the relationship between gene expression and patient OS. Among them, DKK1 (HR = 0.64, 95% CI 0.44-0.92, P = 0.01616), HTR1A (HR = 2.54) , 95% CI 1.31-4.93, P = 0.006), PLXNC1 (HR = 0.66, 95% CI 0.48-0.90, P = 0.00918) and ULBP2 (HR = 0.57, 95% CI 0.33-0.96, P = 0.03313) are independent prognostic genes ( Figure 4E).

Development and validation of IRGs score model
The independent prognostic genes DKK1、HTR1A、PLXNC1 and ULBP2 were chosen to establish a risk score model. From multivariate Cox regression, the coefficients for DKK1、 HTR1A、PLXNC1 and ULBP2 were -0.44768、0.93075、-0.42222 and -0.57078 respectively. Therefore, the IRG score of each patient was calculated according to the formula: IRGs score = ( -0.44768) × (expression value of DKK1) + 0.93075 × (expression value of HTR1A)+ ( -0.42222) × (expression value of PLXNC1) + ( -0.57078) × (expression value of ULBP2) . According to the median risk score (-3.11074) in the TARGET neuroblastoma dataset, patients were divided into high-risk and low-risk groups. Kaplan-Meier survival analysis showed that the OS of patients in the high-risk group was significantly shorter than that of patients in the low-risk group ( Figure 5A). The ROC curve was used to evaluate the 1-, 3-, and 5-year prediction performance of the risk score ( Figure 5C). The results showed that the 1-, 3-, and 5-year AUC values were 0.738, 0.719, and 0.688, respectively. The distribution of risk score, OS, DKK1, HTR1A, PLXNC1, and ULBP2 expression is also shown, which indicates that patients with high risk scores have lower expressions of DKK1, PLXNC1, and ULBP2, higher HTR1A expression and more deaths ( Figure 5E). In addition, in order to confirm that the risk score formula is better than other commonly used clinical parameters (gender, age at diagnosis, INSS staging, MYCN amplification status), we used the roc curve to compare their 135-year overall survival rate and the predictive performance of event-free survival rate. Results It shows that under all conditions, the risk score is better than other parameters (Supplementary Figure 1A, B).
According to the same formula and the above cut-off value, patients in the UCSC data set used for external verification were divided into high-risk and low-risk groups for comparison. Consistent with the previous results, the survival rate of high-risk patients was significantly lower than that of low-risk patients ( Figure 5B). The 1-, 3-, and 5-year AUC values of the ROC working curve were 0.606, 0.676, and 0.620, respectively ( Figure 5D). In addition, the expression of DKK1, PLXNC1 and ULBP2 in the high-risk group was reduced, and the number of deaths in the high-risk group was also higher than that in the low-risk group in the UCSC validation data set ( Figure 5F).

Gene set enrichment and pathway analysis for DE-IRGs
To investigate the underlying molecular mechanism of the IRGs signature, we conducted GSEA comparing the high-risk group with the low-risk group in 209 NB patients from TARGET. The high-risk group has only 1 rich GO pathway and 1 rich KEGG pathway. However, 148 and 25 pathways in the GO and KEGG analysis were determined to be related to the low-risk group, and the top 10 important terms for each module are summarized in Supplementary Table S4. The results showed that the high-risk group may be related to KERATIN FILAMENT and MATURITY ONSET DIABETES OF THE YOUNG, and the low-risk group may be related to vacuum membrane, go ligand dependent nuclear receptor transcription coactivator activity and neurotrophin signaling pathway (Supplementary Figure 2).

Development and evaluation of nomogram based on IRG and clinicopathological risk factors
In order to make prognostic judgments for high-risk neuroblastoma, we deleted 30 non-high-risk samples from the TARGET data set and included 179 COG high-risk group samples to construct a nomogram. Prognostic factors include gender, age at diagnosis, INSS stage, MYCN amplification status, and risk score to construct a nomogram to predict 1-year, 3-year, 5-year overall survival rate and median overall survival time ( Figure 6A, B) . The concordance index of the nomogram is 0.633. The calibration curve shows that the nomogram has good agreement between the predictions of 1-year, 3-year, and 5-year overall survival rates and actual observations ( Figure 6C). However, when predicting the three-year overall survival rate, the risk of the low-risk group will be underestimated.

QRT-PCR verification of DKK1, HTR1A, PLXNC1 and ULBP2 and evaluation of independent predictive ability
Univariate analysis of DKK1, HTR1A, PLXNC1, and ULBP2 showed that although these four genes performed poorly in the Kaplan-Meier survival analysis grouped by the median, they were at the best cutoff value and quartile Kaplan-Meier survival analysis showed a good prognostic ability ( Figure 7A). We use qRT-PCR to further verify their expression in clinical tissues. Interestingly, compared with the tumor tissues of the low-risk group, the relative expression levels of signature genes in the tumor tissues of the high-risk group were significantly reduced ( Figure 7B). In addition, the relationship between the expression levels of DKK1, HTR1A, PLXNC1 and ULBP2 and the risk of COG in Cbioportal is also roughly in line with the results of qRT-PCR ( Figure 7C).

Discussion
Neuroblastoma (NB) is the cancer with the highest incidence of solid tumors in children, and the overall survival rate of high-risk children is less than 50% [19]. It is urgent to identify prognostic biomarkers to personalize the treatment plan for high-risk children. With the continuous exploration of the role of the immune system in tumorigenesis, the phenomenon that the immune system is activated in tumors and then further mediates immune cells to kill tumor cells has been widely recognized [20]. The expression of IRGs in tumors directly reflects the status of the tumor immune system. Therefore, the use of IRG as a prognostic factor for tumors is convincing. Our study is the first to establish a prognostic model for NB patients based on IRGs. With the development and wide application of next-generation" sequencing technology, NB has also established many other genetic prognosis models, such as those based on MYCN-related genes or INSS 4 stage-related genes [15,16]. Unlike other studies, we mainly focus on changes in the expression of IRGs and construct prognostic models for high-risk patients. In addition to having excellent prognostic value for the entire NB population, our research can also guide the clinical exploration of individualized treatment and immunotherapy for children with NB in the high-risk group to a certain extent.
We used public databases and bioinformatics methods to screen the most influential immune-related genes that affect patient survival. A total of nearly 1,000 samples (n = 962) were recruited. As far as we know, this is the largest cohort to establish a prognostic scoring system based on immune-related gene signatures in NB. In order to ensure the consistency between different data sets, we only use the gene microarray data generated by the GPL16876 platform (Agilent-020382 Human Custom Microarray 44k), so the heterogeneity within the cohort is eliminated to some extent. Based on machine learning algorithms, we have established a risk scoring system based on 4 genes to individually predict the prognosis of NB patients. The expression of these 4 genes has been verified on clinical organizations and external data sets. Combining several clinical parameters, we generated a nomogram that can predict the prognosis of high-risk patients. It is worth noting that the expression of the four genes has been verified in our clinical tissues, which means that the four genes in our model have better credibility. On the other hand, the prognostic factors in our nomogram are easy to obtain in clinical work, so these accessible parameters can be used to achieve relatively accurate prognostic survival predictions.
In this study, the expression of DKK1 was negatively correlated with poor prognosis. As a typical inhibitor of wnt/β-catenin, DKK1 plays a key role in tumor pathogenesis [21]. In a variety of cancers, DKK1 regulates tumor angiogenesis, mediates cancer metastasis, regulates chemoresistance, and shows potential as a serum biomarker for diagnosis and prognosis [22][23][24][25]. According to reports, in neuroblastoma, DKK1 is down-regulated by MYCN, and its expansion and overexpression are poor prognostic factors for neuroblastoma [26]. However, DKK1 is down-regulated by XIST through EZH2 induced H3 histone methylation, thereby promoting the growth, migration and invasion of neuroblastoma cells, and delaying tumor progression. PMID: 31208278. In addition, DKK1 is released by neuroblastoma cells, which can affect the balance between osteoblast production and osteoclast production, thereby contributing to the occurrence of osteolytic metastasis [27]. Our results show that higher expression of DKK1 indicates a higher survival time for patients, which has been confirmed by external verification. This indicates that DKK1 in NB may play a protective role as a whole, and the mechanism of DKK1 in NB deserves further study.
The HTR1A gene encodes the 5-HT1A receptor protein, which is one of the seven inhibitory G protein-coupled serotonin receptors [28]. In the past few years, studies have found that 5-HT1A receptor protein is involved in neurodevelopment [29] and is related to neuronal apoptosis [30]. This receptor protein plays a key role in neuron migration, axon growth and synapse formation, and affects the process of neurodevelopment [31]. On the one hand, 5-HT1A has a growth-promoting effect on carcinoid and aggressive cancers. It can induce tumor growth and survival of several cell types, including pancreas, liver, breast, bladder, lung, prostate and other cells [32]. On the other hand, the high expression of 5-HT1A is not only detected in prostate cancer cell lines and small cell lung cancer cell lines, but also in the pathological tissues of prostate cancer, bladder cancer and breast cancer patients by immunohistochemical staining. A positive result of 5-HT1A is given [33][34][35][36]. For now, there are few studies on HTR1A in NB. For example, Murata et al. found DNA methylation of HTR1A in neuroblastoma cell lines [37]. Bence et al. found that HTR1A may be related to transcriptional modulation of monoaminergic neurotransmission genes [38]. The above results suggest that HTR1A is related to the poor prognosis of tumors, which is also consistent with the result that HTR1A is a poor prognostic factor in NB, and HTR1A may be a potential prognostic biomolecular marker for NB.
PLXNC1 is a member of the transmembrane receptor family, and the corresponding gene is located on chromosome 12 in the human genome. Plexins are a part of the semaphorin gene superfamily, which includes the semaphorins and the receptor tyrosine kinases MET and RON [39]. Granja and König suggested that PLXNC1 plays an important role in inflammation, especially acute inflammation [40,41]. In the research on tumors, on the one hand, PLXNC1 was identified as a protective gene in acute myeloid leukemia, papillary thyroid cancer, and melanoma [42][43][44]. On the other hand, PLXNC1 is considered to be related to the poor prognosis of gastric cancer and liver cancer [45,46], and was mainly related to the aggressive phenotype in a pan-cancer study [47]. There are few studies on PLXNC1 in NB. Lebedev reported that PLXNC1 gene expression is correlated with the expression levels of TrkA or KIT in AML and NB, and has prognostic value for both cancers [48]. In our study, PLXNC1 was identified as a protective gene, which provides a reference for the clinical evaluation of patients. On the other hand, there are also research results showing that PLXNC1 is related to the poor prognosis of some cancers, which may promote the study of its adverse effects in neuroblastoma. In summary, PLXNC1 can indicate patient survival and is a potential biomarker for neuroblastoma.
ULBP2 is a NKG2D ligand, highly expressed in transformed cells [49] and is believed to provide a ligand for NK cells, which is conducive to tumor cell death [50][51][52]. However, studies have shown that overexpression of ULBP2 is associated with poor prognosis of cancers, including expression in melanoma, non-small cell lung cancer and ovarian cancer [53][54][55][56]. Hayakawa et al. believed that the upregulation of ULBP2 would lead to a bad result may be due to the secretion or cleavage of this protein is related to reduced NK mediated tumor lysis [52]. In the study of ULBP2 in NB, the expression of ULBP2 was detected in half of the cases of primary NB, but only in a few cell lines in NB cell lines [57]. The difference in the expression pattern between the primary tumor and the cell line may be related to the changes caused by the long-term culture of the cells or the sensitivity of the detection technology used. Our research shows that ULBP2 plays a protective role in NB, and based on current knowledge, the specific function of the interaction between ULBP2 and NK is still unclear.
To be sure, our analysis has some limitations. First of all, our analysis is based on expression at the mRNA level, without considering the expression at the protein level or post-transcriptional modification, which also has important biological effects. In addition, our IRGs are screened in the differential genes between high-risk patients and non-high-risk patients. However, we may have overlooked some genes with key prognostic value, because these genes are not necessarily different between children in high-risk groups and children in non-high-risk groups. Finally, there is a lack of validation in more neuroblastoma cohorts, which is limited by the availability of data.

Conclusions
In summary, our team established a risk scoring formula based on 4 immune-related gene markers in neuroblastoma, where a high score was significantly associated with a poor prognosis, which was verified in an external cohort. Combining several other clinical information to draw a nomogram, it can be used as a powerful and accurate tool to predict the survival rate of high-risk patients. It can help clinicians to more accurately choose the best treatment plan for NB patients.

Supplementary Material
Supplementary  Figure S1: ROC curve of risk score, INSS staging, MYCN status, age at diagnosis and COG risk stratification for 1-year, 3-year, 5-year overall survival and event-free survival.