NUF2 Correlates with Poor Prognosis in Non-Small Cell Lung Cancer: A Systematic Multi-Omics Analysis

Background: Lung cancer is one of the most common malignant tumors and the leading causes of cancer-related deaths worldwide. As a component of the nuclear division cycle 80 complex, NUF2 is a part of the conserved protein complex related to the centromere. Although the high expression of NUF2 has been reported in many different types of human cancers, no studies have conducted systematic multi-omics analysis on NUF2 in non-small cell lung cancer. Methods: In this analysis, NUF2 difference analysis in non-small cell lung cancer were evaluated in GEO, TCGA, Oncomine, UALCAN databases, and a prognosis analysis of NUF2 based on Kaplan-Meier was performed. R language was used to analyze the differential expression genes, functional Annotation and protein-protein interaction. GSEA analysis of differential expression genes were also carried out. Mechanism analysis about exploring the characteristic of NUF2, multi-omics and correlation analysis were carried out using UALCAN, cBioportal, GEPIA and TIMER. Results: The expression of NUF2 in tumor tissues was signi�cantly higher than that in normal tissues. The analysis of TCGA and UALCAN database samples proved that NUF2 expression was connected with stage, TNM stage and smoking habits. Meanwhile, the overall survival curve also validated that high expression of NUF2 indicated poor prognosis. Furthermore, immune in�ltration analysis showed NUF2 had correlation with immune cells and NUF2 altered group had a poor prognosis than unaltered group in non-small cell lung cancer. Conclusion: Our results demonstrated that data mining e�ciently reveals NUF2 expression and potential regulatory mechanism in non-small cell lung cancer, laying a foundation for further study of the role of NUF2 in diagnosis and treatment in non-small cell lung cancer.


Introduction
Lung cancer is one of the most common malignant tumors and the leading causes of cancer-related deaths worldwide.The ve-year survival rate of lung cancer patients mainly depends on the stage of the disease and regional differences, with uctuations ranging from 4-17% [1].Due to limited treatment options, lung cancer usually does not require further morphological classi cation.Therefore, in the past few decades, non-small cell lung cancer (NSCLC) and small cell lung cancer(SCLC) are the most commonly used diagnostic terms for lung cancer [2].NSCLC are approximately 80-85% of lung cancers, which contains approximately 40-50% cases of lung adenocarcinoma(LUAD) and 20-30% cases of lung squamous cell carcinoma(LUSC).Up to present, the research progress of NSCLC has found many target genes that correlated with tumor metastasis, invasion, prognosis, such as EGFR, KRAS, ALK, ROS1 etc.Thus, the target therapy of EGFR, ALK, ROS1 have been the rst-line therapy of NSCLC for a long time.Soon after, medicine of PD-1 and PD-L1 have become new treatment solutions in NSCLC.However, due to the poor prognosis, there is an urgent need to discover new target or biomarker for NSCLC.
The nuclear division cycle 80(NDC80) complex is a heterotetrameric protein complex, and NUF2, as an important part of the NDC80 complex, is essential for kinetochore-microtubule attachment and chromosome separation [3].Previous evidence demonstrated that NUF2 plays as a prognostic biomarker and therapeutic target in hepatocellular carcinoma, breast cancer and oral cancer [4][5][6], and NDC80 complex gene might be an early indicator of diagnosis and prognosis of lung adenocarcinoma [7].These studies have shown that NUF2 might be a candidate biomarker and therapeutic target in cancer with great potential, but the role of the NUF2 gene in NSCLC has not been systematically explored in multiple aspects.
In recent years, more and more platforms, databases and various data sets on the Internet have enabled cancer researchers to use multi-omics data to conduct bioinformatics analysis of cancer.In order to get a better understanding of the roles of the NUF2 gene in NSCLC, we performed expression pro ling, prognosis valuation, DNA methylation, gene mutation, immune in ltration and clinic pathologic association signi cance of the NUF2 gene with the public-accessible cancer genomics database.To lay a foundation for further exploring new markers and understanding of the mechanism of the occurrence and development of non-small cell lung cancer.

Oncomine Database Analysis
The Oncomine database(https://www.oncomine.org/resource/main.html)compiles 65 gene chip data sets, 4700 chips and 480 million gene expression data [8].It is a large and comprehensive database which was used to facilitate data mining.Here, we employed this database to evaluate the transcription level of NUF2 in multiple tumors, especially lung cancer.

UALCAN Database Analysis
UALCAN(http://ualcan.path.uab.edu/index.html) is a comprehensive interactive web platform for analyzing omics data in cancer [9].It can help researches to perform biomarker identi cation, expression pro le analysis, survival analysis of related genes, and can also query information in TCGA databases through related links.All in all, it is a simple, fast and effective TCGA data mining and analysis website tool.In this article, we use the database to evaluate the expression pro le of NUF2 genes, the signi cance of DNA methylation and clinicopathological associations.

GEO and TCGA Database Analysis
To further verify the analysis in Oncomine and UALCAN we downloaded data for NSCLC from the GEO and TCGA database.We accessed the TCGA data portal to download mRNA expression quanti cation pro les(HTSeq-FPKM) and the mRNA expression pro les of GSE77803 and GSE32863 were acquired from GEO database.The TCGA dataset and GSE77803 dataset contained adenocarcinoma and squamous cell carcinoma, while the GSE32863 only contained adenocarcinoma data in order to verify the analysis from Oncomine and UALCAN.Clinical variables, such as stage, age, AJCC TMN cancer stage [10], were used for the assessment of the correlation between the expression of NUF2 and the parameters.

Kaplan-Meier plotter
Kaplan Meier plotter (KMplotter) is often used for single-gene or multi-gene prognostic analysis of various malignant tumors [11].We assessed the effect of NUF2 expression on the survival rate in NSCLC and the subtype of it (adenocarcinoma and squamous cell carcinoma).We also further explore the effect of NUF2 expression on the survival rate due to the enriched and decreased immune cells in NSCLC.

Functional annotation and Protein-protein interaction
Gene ontology(GO) and Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway enrichment analysis were utilized to evaluate the possible functions of the differential expression genes(DEGs) from GSE77803 which grouped by the expression of NUF2.Protein-protein interaction analysis(PPI, https://string-db.org/) for NUF2 were performed by STRING database [12].Molecular Complex Detection(MCODE) from Cytoscape were used to nd clusters (highly interconnected regions) in a network.

Gene-set Enrichment Analysis(GSEA)
In order to further verify the enrichment analysis of the KEGG pathway, gene set enrichment analysis was performed using the GSEA program (v.4.0.3) [13].The data sets used include the H: hallmark gene set and C2.CP.KEGG (KEGG gene set) from the Broad Molecular Signatures Database (MSigDB) set.The number of random sample permutations is set to 1,000, and the signi cance threshold is P<0.05.
GEPIA database analysis GEPIA (http://gepia.cancer-pku.cn/index.html) is an online comprehensive database that can be used to standardize the analysis of the RNA-seq data of 9,736 tumor samples and 8,587 normal control samples from the TCGA and the Genotype-Tissue Expression(GTEx) datasets [14].We rst use the GEPIA database to analyze the correlation between NUF2 and some special genes, and also use it to understand the correlation between NUF2 expression and immune cell subset markers.c-BioPortal analysis cBio Cancer Genomics Portal (c-BioPortal) (http://cbioportal.org),currently including 225 cancer studies, is an open bioinformatics analysis platform that can be used to interactively explore multidimensional cancer genomics data sets [15].It provides opportunities to explore, visualize and analyze a variety of cancer genomics data.We used c-BioPortal to analyze NUF2 and the components of NDC80 complex alterations in TCGA NSCLC sample.The OncoPrint displays an overview alterations of genes above per sample in TCGA-NSCLC, while the Mutations dispalys mutational site of every gene in detail.We also used the Comparison/Survival to explore the difference between gene altered group and unaltered group.TIMER database analysis TIMER (https://cistrome.shinyapps.io/timer/) is a network platform for comprehensive analysis of tumor in ltrating immune cells [16].It allows the user to input function-speci c parameters and dynamically display the result graph to easily obtain the immunological, clinical and genomic characteristics of the tumor.Therefore, in our study, we use it to explore the association of NUF2 expression with immune cell (B cells, CD4 + T cells, NK cells, CD8 + T cells, Th1 cells, neutrophils, Monocyte, macrophages, Treg and dendritic cells, etc) in ltration and the subgroup markers of them.

Differential expression of NUF2 in NSCLC
We developed a ow diagram to show our process(Fig.1).We initially evaluated NUF2 transcription levels between tumor and normal tissues in pancancer from Oncomine and TIMER(Fig.2A), and further analyzed the expression of NUF2 in NSCLC via UALCAN, TCGA, GEO.As shown in Fig. 2A, data in these databases revealed that mRNA expression of NUF2 were signi cantly higher in NSCLC.Then, we further studied the transcription levels of NUF2 in LUSC and LUAD (P<0.01).NUF2 mRNA was signi cantly over-expressed in TCGA-FPKM (LUSC) and GSE32863(LUAD)(P<0.01)(Fig.2B).

The association of NUF2 expression with prognosis and clinicopathological factors in NSCLC
We next explored the correlation between the transcription level of NUF2 with prognosis and clinicopathological factors.Thus, we utilize the Kaplan-meier plotter in order to evaluate whether NUF2 expression relates to the prognosis of NSCLC, to reveal NUF2 high expression to be signi cantly associated with a poorer OS in NSCLC(OS HR=1.91, 95%CI=1.38-2.65,logrank P=7e-05) and its subgroup, like OS in LUAD(OS HR=1.54, 95%CI=1.21-1.97,logrank P=0.00047) (Fig. 3Ba-b).But there was no obviously signi cant connection between the expression of NUF2 and the OS of LUSC(Fig.3Bc).We further explored the association between NUF2 expression and clinicopathological features.Subgroup analysis of mutiple clinic pathological features of NSCLC in UALCAN showed that the transcription level of NUF2 was signi cantly higher in NSCLC(both LUSC and LUAD) than normal group based on stage and smoking habits.We also found the transcription level of NUF2 was signi cant difference based on T and N stage of AJCC TMN cancer stage in NSCLC via TCGA database (Fig. 3A).

Functional annotation and protein-protein interaction
The DEGs analysis was performed according to the expression of NUF2 in GSE77803.The volcano plot showed 262 genes(red dots) signi cant positive and 164 genes(green dots) negative correlations with NUF2, and the top 20 signi cant genes set positively and negatively correlated with NUF2 as shown in the heat map(Fig.2Ca).To go a step further, we assessed the association of the top 20 positive genes and negative gene in Fig. 2Cc.We next wanted to determine the functional annotation and protein-protein interaction network(PPI network) of the differential expressed genes above (including NUF2) in NSCLC.As shown in Fig. 4Aa, DEGs were mainly enriched in organelle ssion and nuclear division via Biological Process(BP) GO annotations, spindle via Cellular Component(CC) GO annotations and ATPase activity via Molecular Function(MF) GO annotations respectively.KEGG pathway enrichment of NUF2 interactive genes showed that cell cycle was the most enriched pathways(Fig.4Ab).Among them, the cellc cycle had the smallest P-value(P=2.45e-10)and the largest number of involved consensus genes(count=18).As shown in Fig. 4Ba, there were 22 nodes and 120 edges in the network of PPI via STRING.The vast majority of the nodes were upregulated DEGs in the network.TTK, RRM2, RAD51AP1, PBK, NDC80, MELK, KIF4A, DLGAP5, CEP55, CCNB1, CDC20 have the largest edges in the network(Fig.4Bc).In addition, we found a signi cant module via MCODE in the Cytoscape and the most signi cant pathway in the module was enriched in cell cycle(Fig.4Bd).
We further used GSEA enrichment analysis to demonstrate these results in MSigDB database.In curated gene sets(C2.CP.KEGG), the genes related with NUF2 enriched in cell cycle, pyrimidine metabolism, purine metabolism and DNA replication; while in hallmark gene sets, the genes related with NUF2 enriched in G2M checkpoint, E2F targets, mitotic spindle, PI3K-AKT-mTOR signaling(Fig.4C).

Basic characteristic of NUF2 and the correlation of NUF2 with special genes
To understand the characteristic of NUF2, we explored the basic information and mechanism of NUF2 in NSCLC on Genecards and Compartments.As shown in Fig. 4Aa, NUF2 was a protein-coding RNA which located at q23.3 in Chromosome 1.In compartments, Fig. 4Ab showed the subcellular locations of NUF2 and the highest con dence of subcellular locations were nucleus and cytosol.These results helped us to know the functional location of NUF2 in the cells.In order to know the association of NUF2 with some special genes such as EGFR, KRAS, ROS1 and genes related with cell cycle, we explored it by using GEPIA.We found NUF2 was associated with KRAS, EGFR, ROS1, PIK3CA and also related to CDK1/2/4/6, E2F1 which are enriched in cell cycle.NUF2 had a positive correlation with KRAS, EGFR, PIK3CA, CDK1/2/4/6, E2F1; while had a negative correlation with ROS1(Fig.5B).By analyzing the relationship of NUF2 with cell cycle and tumor-related genes, it showed that NUF2 was involved in tumorigenesis and development.

NUF2 DNA methylation status in NSCLC
By using the UALCAN website for differential methylation analysis, we found that the promoter methylation level of NUF2 in LUSC was higher than that of normal tissues.In contrast, the promoter methylation level of NUF2 in LUAD was lower than in normal tissues(Fig.6A).Then, based on different clinical characteristics, we further discovered whether the promoter methylation level of NUF2 was correlated with clinical characteristics.The subgroup analysis results showed the promotor methylation of NUF2 was possibly impact by stage, smoking status and N stage of AJCC TMN cancer stage(Fig.6B).

NUF2 alteration in NSCLC
We then used the cBioPortal to determine the types and frequency of NUF2 alterations based on Whole exome sequencing data from NSCLC(data including 42.3% LUSC and 57.7% LUAD) in TCGA.To highlight the role of NUF2 in NSCLC, we also compared NUF2 with NDC80, SPC24, SPC25, which were the component of NDC80 complex.As shown in Fig. 7A, the NDC80 complex was totally altered in 148 of 1144(12.9%) in NSCLC patients, while NUF2 was altered in 92 of 1144(8%) and most of the cases are ampli cation.The alteration of NUF2 accounts for 62% of the total alteration.To go a step further, we explored the speci c alteration of every genes, and we also found that the number of alteration sites in NUF2 was more than others(Fig.7Ac-f).
We next explored the correlation between alteration and the prognosis of NSCLC in cBioportal.As shown in Fig. 7B, the altered group signi cantly linked with a poorer prognosis in NUF2(P=0.0407)while selecting genomic pro les as somatic mutations, and we didn't found obvious difference in other genes.
We next want to nd out the relationships between the expression of NUF2 and the type markers of various immune cells in NSCLC.The markers of B cells, CD8+ T cells, neutrophils, marcrophages, dendritic cells(DCs), NK cells, Th1 cells, Treg, monocyte were tested via TIMER database.Markers of immune cells were considered to investigate further association of NUF2 expression with immune cells.No matter whether the correlation adjusted or not, NUF2 in LUSC was negatively correlated with markers in several immune cells, such as FCRL2, CD19, MS4A1 in B cells; FCGR3B, CEACAM3, SIGLEC5, FPR1, CSF3R, S100A12 in neutrophils; CD68, CD84, CD163, MS4A4A in macrophages; CD209 in dendritic cells; FOXP3, CCR8 in Treg; C3AR1, CD86, CSF1R in monocyte(Table 1).Meanwhile, NUF2 in LUAD was negatively correlated with MS4A1 in B cells, CD8A; CD8B in CD8+T cells; CSF3R, S100A12 in neutrophils, KIR3DL3, NCR1 in NK cellls; CSF1R in monocyte.We next used GEPIA to verify the results.In LUSC, correlation results between NUF2 and markers of macrophages, neutrophils, dendritic cells in GEPIA are similar to those in TIMER.In LUAD, correlation results between NUF2 and markers of B cells, CD8+T cells, NK cells in GEPIA are similar to those in TIMER(Table 2).

Prognosis analysis of NUF2 expression in NSCLC based on immune cells in ltration
We have demonstrated that the expression of NUF2 have some relationship with the immune cell in ltration in NSCLC, and the expression of NUF2 was also related to the prognosis of NSCLC.So we speculated whether the expression of NUF2 in NSCLC would have in uence on the prognosis partly affected by immune cell in ltration.We analyzed the correlation between NUF2 expression and prognosis based on the enrichment of related immune cell in NSCLC.The results revealed that higher expression of NUF2 of LUAD in enriched B cells, enriched CD4+T cells and enriched macrophages(HR=2.01)had a poor prognosis respectively(Fig.8B).The results in LUSC described that high expression of NUF2 in enriched macrophages have a better prognosis(Fig.8B).

Discussion
With the appearance of modern genetics and molecular biology in the 1980s, reseachers started to an gradually clear view of the genetic alterations that underlie uncontrolled proliferation in tumor cells [17].These disadvantage contain (but are not limited to) the high expression of genes which drive or participate in cell cycle progression, such as cyclin D1(CCND1), cyclin-dependent kinase 4 (CDK4) and Cyclin Dependent Kinase 6(CDK6) [18].NUF2 is a composition of a molecular linker between the kinetochore attachment site and the tubulin subunits within the lattice of the attached plus ends [19].NUF2 is also an oncogene that is over-expressed on several cancers [5,20], including lung adenocarcinoma.Comparing with previous study, the NUF2 expression in multiple databases was signi cantly higher than in normal tissues, and also signi cantly higher in LUAD and LUSC [7].
In addition, we selected datasets randomly from TCGA and GEO to assess the expression of NUF2 in LUAD and LUSC, and we also found NUF2 was signi cantly higher than in normal tissues.To summarize the results from prognosis analysis and correlation analysis with clinicopathological factors, the higher expression of NUF2 indicate the poorer prognosis, poorer clinicopathologic stage.To verify the functions of NUF2 in NSCLC, no matter whether in GO, KEGG, PPI and GSEA, NUF2 was enriched in cell cycle in NSCLC.And this was consistent with the idea that the pathological process of tumor was inseparable with the abnormalities in cell cycle.The cyclin-dependent kinase (CDK)-RB-E2F axis was the key transcription mechanism that drives the cell cycle procession.One or more important parts of this axis (cyclins, CDKs, CDK inhibitors and the RB family of proteins) are changed, which occurs in almost all cancers and leads to oncogenic E2F increased activity and uncontrolled proliferation [21].In our study, we found the association between NUF2 and these genes related to cell cycle procession or oncogenes in the progresss of NSCLC, such as E2F1, CDK1/2/4/6, KRAS, EGFR, ROS1, PIK3CA.And we also found NUF2 was highly expressed in nuclear via Genecards.Therefore, NUF2 maybe inferred to be a key gene involved in the occurrence, development and prognosis of NSCLC.
The variation of the epigenome are thought to indicate the in uence of genetic and environmental risk factors on multiple disease.So far, DNA methylation is still the only epigenetic marker that can be stably detected in various samples [22].The change of normal DNA methylation patterns contain DNA hypomethylation, which occurs pathologically in normally unmethylated regions of the genome, and DNA hypermethylation, which usually occurs in the CpG islands of gene promoters [23].In DNA methylation analysis, we found NUF2 methylation level was lower in LUAD than in normal tissues.Surprisingly, we found NUF2 methylation level was higher in LUSC than in normal tissues.We have found the same trend while exploring the association between NUF2 methylation level and clinicopathological features.The difference of NUF2 methylation levels in different types of NSCLC may suggest that the NUF2 gene has different epigenetic regulation in LUSC and LUAD.Since methylation in the promoter region of a gene can inhibit gene expression at the transcription level, the overexpression of NUF2 transcription level in non-small cell lung cancer has been con rmed by multiple studies including our research.As there is no current research on NUF2 methylation, further research was needed to explore the locus of NUF2 gene methylation, and more researches are needed to explore the mechanism between NUF2 expression and NUF2 methylation.Combing the relationship between NUF2 with cell cycle-related genes and lung cancer oncogenes, future studies need to further explore whether the different levels of NUF2 methylation have different effects on the occurrence, development and prognosis between lung adenocarcinoma and lung squamous cell carcinoma, and compare the difference of NUF2 methylation between LUAD and LUSC.
Although previous study have found missense mutations of NUF2 via large-scale tumor sequencing projects [24], there was seldom study to explore the alteration of NUF2 in NSCLC.Another analysis from cBioportal revealed the alteration frequency of NUF2 in NSCLC.The alteration rate of NUF2(8%) was higher than other components of NDC80 complex.And we also found gene ampli cation was frequently found in NUF2 than other type of alteration.
Genome instability was the molecular genetic marker of tumorigenesis.And gene ampli cation, as the main form of genome instability, plays an important role in the occurrence and development of many human malignant tumors [25,26].This result indicated that NUF2 gene was mainly ampli ed in non-small cell lung cancer, suggesting that NUF2 gene ampli cation may be related to the occurrence and development of NSCLC.Due to the lack of research on NUF2 gene ampli cation, more research need to focus on the manifestations of NUF2 gene ampli cation in NSCLC, and the relationship between NUF2 gene ampli cation and NSCLC drug resistance or tumor cell escape growth inhibition [27,28].Due to more and more gene mutation found in tumor occurrence and metastasis, we assess whether the alteration of NUF2 affects tumor prognosis.And we found the difference between altered group and unaltered group of NUF2 in NSCLC, but no signi cance in NDC80 and SPC25.This indicated that the NDC80 complex may affect the prognosis of NSCLC via NUF2 mutation.
Another important aspect of this study was that the expression of NUF2 was correlated with the in ltration level of various immune cells in NSCLC.A large number of studies have explored immune cells in malignant tumor tissues to explore and their relationship with tumor occurrence, development and prognosis [29,30].Immune cell in ltration in primary lung squamous cell carcinoma have a better prognosis [31].It was also found that the presence of high levels of both CD8 + T cells and CD4 + T cells were a signi cant marker of a better prognosis for patients in NSCLC [32].We explored the relationship between NUF2 and immune cell in ltration in NSCLC.Our results demonstrate that there is a negative relationships between NUF2 expression level and in ltration level of multiple immune cells in LUAD and LUSC.Moreover, the correlation between NUF2 expression and the marker genes of immune cells implicate the role of NUF2 in regulating tumor immunology in LUSC and LUAD.Through the analysis above, no matter in LUAD or LUSC, NUF2 gene was negatively correlated with macrophages and DCs.These results reveal the potential regulating role of NUF2 in polarization of tumor-associated macrophages(TAM) and DCs.TAMs in ltration was closely related with tumor speci c pathological characteristics, such as immunosuppression, vascular lymphangiogenesis, invasion and metastasis, drug resistance, etc [33].Although DCs constitute a rare immune cell population in multiple tumors, these cells are central for the initiation of antigen-speci c immunity and tolerance [34].Thus, these results also reveal the potential regulating role of NUF2 in cancer immunology.However, the pathway through which NUF2 regulated immune cell in ltration still needed further research.After that, the results of prognostic valuation based on immune cells showed a poorer survival rate of NUF2 high-expression in enriched macrophages both in LUAD and LUSC.These messages reminded us that NUF2 may affect the survival rate of NSCLC via TAM.

Conclusions
To sum up, NUF2 transctription levels increased signi cantly in NSCLC.NUF2 was enriched in cell cycle via functional annotation and PPI analysis.In multi-omics analysis, we found the NDC80 complex may affect the prognosis of NSCLC via NUF2 mutation and NUF2 methylation level was lower in LUAD and higher in LUSC.Elevated NUF2 expression was negatively correlated with immune cells in ltration and prognosis of NSCLC, and NUF2 may affect the survival rate of NSCLC via TAM.However, further studies need to be done to assess the diagnostic and therapeutic role of NUF2 in NSCLC.

Figures
Figures

Figure 1 Flow
Figure 1

Figure 2 Difference
Figure 2

Figure 5 Basic
Figure 5

Figure 6 DNA
Figure 6

Table 1
Correlation results between NUF2 and markers of immune cells via TIMER LUSC, lung squamous cell carcinoma; LUAD, lung adenocarcinoma; NK cells, Natural killer cells; Th 1 cells, type I helper T cells; Treg, regulatory T cells; COR, r value of Spearman's correlation; P, p-value; Purity, correlation adjusted by purity; Age correlation adjusted by age.

Table 2
Correlation results between NUF2 and markers of immune cells via GEPIA LUSC, lung squamous cell carcinoma; LUAD, lung adenocarcinoma; NK cells, Natural killer cells; R, r value of Spearman's correlation; P, p-value.squamous cell carcinoma; LUAD, lung adenocarcinoma; NK cells, Natural killer cells; R, r value of Spearman's correlation; P, p-value.