Identify key genes and pathways in pancreatic ductal adenocarcinoma via bioinformatics analysis

Background: Pancreatic cancer has many pathologic types, among which pancreatic ductal adenocarcinoma (PDAC) is the most common one. Bioinformatics has become a very common tool for the selection of potentially pathogenic genes. Methods: Three data sets containing the gene expression profiles of PDAC were downloaded from the gene expression omnibus (GEO) database. The limma package of R language was utilized to explore the differentially expressed genes (DEGs). To analyze functions and signaling pathways, the Database Visualization and Integrated Discovery (DAVID) was used. To visualize the protein-protein interaction (PPI) of the DEGs ,Cytoscape was performed under the utilization of Search Tool for the Retrieval of Interacting Genes (STRING). With the usage of the plug-in cytoHubba in cytoscape software, the hub genes were found out. To verify the expression levels of hub genes, Gene Expression Profiling Interactive Analysis (GEPIA) was performed. Last but not least, UALCAN analysis online tool was implemented to analyze the overall survival. Results: The 376 DEGs were highly enriched in biological processes including signal transduction, apoptotic process and several pathways, mainly associated with Protein digestion and absorption and Pancreatic secretion pathway. The expression levels of nucleolar and spindle associated protein 1 (NUSAP1) and SHC binding and spindle associated 1 (SHCBP1) were discovered highly expressed in pancreatic ductal adenocarcinoma tissues. NUSAP1 and SHCBP1 had a high correlation with prognosis. Conclusions: The findings of this bioinformatics analysis indicate that NUSAP1 and SHCBP1 may be key factors in the prognosis and treatment of pancreatic cancer.


Background
Pancreatic cancer is famous for its high malignancy, rapid development and poor prognosis. The fiveyear survival rate of pancreatic cancer is no more than 6%. 1 . It has many pathologic types, among which pancreatic ductal adenocarcinoma (PDAC) is the most common one and has a 5-year survival rate less than 3% . 2,3 .
Even to this day, no treatment other than surgical excision such as chemotherapy or radiation has had a significant effect on pancreatic cancer. The specific mechanism of pancreatic cancer's strong ability to proliferate and invade has not been explored. Pancreatic cancer can easily invade and metastasize to surrounding tissues through blood vessels and lymph nodes. 4 . Many patients who have been diagnosed with pancreatic cancer have lost the opportunity for surgery because it is usually advanced.
Up to now, we still need more biomarkers which have high sensitivity and specificity rather than CA19-9, the only biomarker approved by the Food and Drug Administration. CA19-9 has many limitations, especially specificity and effectiveness . 5 . Differentially expressed genes play an important role in the proliferation, invasion and progression of pancreatic cancer, includingp16, SMAD4, K-ras and p53. 6 . But they still couldn't fully explain how pancreatic cancer develops.
Bioinformatics has become a very common tool for the selection of potentially pathogenic genes.
Many researchers have used bioinformatics to identify potential oncogenes. Despite the lack of in vivo or in vitro analysis, it is still a clever way to find potential oncogenes. For example, Look for potential oncogenes in ovarian cancer and pancreatic cancer. 7-9 .
Microarray analysis has been a very mature technology and has been widely used to screen differentially expressed genes in cancer. This present study is set out to investigate the DEGs between pancreatic ductal adenocarcinoma tissues and normal pancreatic tissues by determining gene expression profiles from GEO database. Besides, the functional enrichment and interaction network analyses of the DEGs were verified to help us explore the hub genes and the underlying molecular mechanisms of pancreatic ductal adenocarcinoma.
In result, 376 DEGs and 30 hub genes were identified, which may be potential biomarkers for pancreatic ductal adenocarcinoma. Among these genes, SHCBP1 and NUSAP1 were worthy of further experimental verification of their mechanism in pancreatic tumors and were likely to be a key factor in the prognosis and treatment of pancreatic cancer 2. Methods 2.1 Affymetrix Microarray Data National Center for Biotechnology Information Search database created a gene expression database (http://www.ncbi.nlm.nih.gov/geo), which preserves the high-throughput gene expression data provided by research institutions around the world.
Utilizing the keywords "pancreatic ductal adenocarcinoma", we searched the GEO database. Three GEO datasets were downloaded, including GSE46234 contributed by Raeder H, GSE46385 contributed by Walters DM, and GSE71989 contributed by Schmittgen T. These gene expression profiles of pancreatic ductal adenocarcinoma were downloaded based on GPL570 Affymetrix Human Genome U133 Plus 2.0 Array. The GSE46234 dataset included 4 pancreatic ductal adenocarcinoma tissues and 4 normal pancreatic tissues. The GSE46385 consisted of 9 pancreatic ductal adenocarcinoma tissues and 3 normal pancreatic tissues. GSE71989 contained 13 pancreatic ductal adenocarcinoma tissues samples and 8 normal pancreatic tissues. We presented the detailed information about the datasets in Table 1.

Screening for DEGs
By using Affy package of R language, we manipulated the raw data by adjusting background, normalizing the data and log-transforming the raw data values. 10 . Using Perl language software, we merge these GEO datasets together and batch normalized the data set by utilizing the sva package in R language. These DEGs in pancreatic ductal adenocarcinoma and normal pancreatic tissues were analyzed by utilizing the limma package. Log2-foldchange(log2FC) > 2 and a corrected value < 0.05 were set as the cut-off criteria of DEGs samples.

GO and KEGG Pathway Enrichment Analyses
We utilized the DAVID(version 6.8 ) online database to explore the Biological functions of the DEGs obtained from the integration of microarray data . 11 .In order to verify the enrichment signaling pathways of DEGs, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment analyses were performed with DAVID.

PPI Network Integration, Modules Analysis, and Selection of Hub Genes
We utilized the STRING (version11) online database to build a PPI network,in order to identify the interaction between protein-protein interaction(PPI) . 12 . We set the highest confidence of the argument of interactions at > 0.7. To visualize and to analyze the PPI network, we used the Cytoscape (version 3.6.1) software. The plug-in cytoHubba was utilized for exploring key nodes and fragile motifs in the PPI network. 13 .
2.5 RNA sequencing expression and survival analysis and of core genes UALCAN is a interactive web resource for analyzing cancer OMICS data (TCGA and MET500). It has high quality graphics using java script and CSS. 14 . To identify these DEGs, we applied the GEPIA website to analyze the data of RNA sequencing expression on the basis of thousands of samples from TCGA and Genotype-Tissue Expression(GTEx

GO and KEGG Analysis of DEGs.
As shown in Table 3  We presented the KEGG analysis results in Table 4, which indicated that DEGs were particularly enriched in Protein digestion and absorption, Pancreatic secretion, ECM-receptor interaction and some other pathways. (P < 0.05).

Protein-protein interaction network (PPI) and modular analysis
In order to explore the biological characteristics of these DEGs, we created a PPI network with the usage of the STRING database. We presented the result in Fig. 4. Meanwhile we utilized the cytoHubba plug-in to screen hub genes and the result is shown in Fig. 5

Survival Analysis
By comparing all the literature on pancreatic cancer to date, we found two genes that had not been explored and studied in pancreatic cancer especially pancreatic ductal adenocarcinoma.
In order to validate the expressions of these two hub genes, we utilized the GEPIA website to analyze the data of RNA sequencing expression on the basis of 350 samples ( Fig. 6.Aand Fig. 6.B). Moreover, UALCAN website was used to analyze the hub genes survival in the 177 samples which is derived from the TCGA project ( Fig. 6.Cand Fig. 6.D).
The result indicated the expressions of NUSAP1 and SHCBP1 were closely related to the survivals in pancreatic adenocarcinoma.

Discussion
In the past few decades, many causes and potential mechanisms for the formation and development of pancreatic adenocarcinoma have been identified by a lot of experts.
However, there was no significant improvement in 5-year survival. Moreover, the identification of reliable molecular markers with high prognostic value has not been found in pancreatic ductal adenocarcinoma. Therefore, we need to explore and discover a specific biomarker to detect early pancreatic adenocarcinoma, which is conducive to the treatment and survival of patients. Last but not least, we need to discover and validate new molecular targets to develop drugs that can effectively treat pancreatic cancer. Many studies concentrate on an independent genetic event, or the result is generated from independent studies which are inconsistent with each other by microarray analysis. [16][17][18] .
Some GEO samples (GSM) of pancreatic adenocarcinoma were not grouped into pancreatic ductal adenocarcinoma, the result may lose some details or may even be wrong for the mixed disease subtypes. Pancreatic adenocarcinoma pathogenesis in different patients may depend on common changes of the expression of particular critical genes, and rather on personal particular changes of different genes. Comparing pancreatic adenocarcinoma instead of pancreatic ductal adenocarcinoma tissues with normal tissues may miss some important genes. Our study was done by comparing pancreatic ductal adenocarcinoma tissues with normal pancreatic tissues which may find some potential genes had not found in previous research. Our researcher found, in agreement with previous studies, most of the hub genes plays an important role in the development and progression of pancreatic ductal adenocarcinoma and they were reported and explored by previous research. That proves the reliability of our research in some ways.
TOP2A could activates EMT process which was highly associated with tumor metastasis in pancreatic cancer patients and could enhance tumor progression . 19 . PHGDH could down regulate the expression of CCNB1,CCND1, MMP-2, and MMP-9 in pancreatic cancer cells and inhibited the cell invasion, migration, and proliferation . 20 . The down-regulated expression of CDC20 in pancreatic cancer cells could do the same work. Meanwhile it induced cell apoptosis in pancreatic cancer cells. 21 . The knock-down of NDC80 in Panc-1 cells could inhibited cell proliferation and colony formation. Moreover, it infected Panc-1 cells resulted in induction of apoptosis and cell cycle arrest at G2/M phase . 22 . The interference suppression of KIF20A resulted in an inhibition of motility, invasion and proliferation of pancreatic cancer cell lines. 23 . The upregulated PBK/TOPK in pancreatic cancer and upregulated expression levels was highly correlated with the invasive property of pancreatic cancer cells. By regulating PBK/TOPK pathway, the invasion ability of PDAC cells can be regulated . 24 .
The knock-down of IMP3 in pancreatic cancer reduced PDAC cell viability, extracellular matrix adhesion and invasion. IMP3-depleted cells presented lower levels of CD44 protein and KIF11 mRNA. 25 . Overexpression of BUB1B, CDK1, CCNA2 and CDC20 in pancreatic cancer was highly associated with disease-free survival in PDAC patients. 26 .
As for the genes we found, Nucleolar and Spindle Associated Protein 1(NUSAP1) and SHC binding and spindle associated 1(SHCBP1) were both hub genes and interact with the hub genes which we discussed above, which suggested that NUSAP1 and SHCBP1 may also play a very important role in pancreatic cancer.
We found that NUSAP1 had been reported the expression of NUSAP1 is up-regulated in non-small cell lung cancer, which is associated with the growth and development of non-small cell lung cancer and prognosis of the patients. 27 . In non-small cell lung cancer, the inhibited expression of NUSAP1 could result in the blocking the cell cycle on G1 phase. 28 . In addition, NUSAP1 Inhibited Cell Proliferation in Invasive Breast Cancer Cells. 29 . Last but not least,NUSAP1 was associated with HCC onset, progression, and prognosis, and exhibited higher expression in HCC compared with normal livers. 30 .
In all, NUSAP1 was associated with different types of tumors. Until now, there is no such study in

Conclusions
All in all, our data analysis study found SHCBP1 and NUSAP1 which has not been published in pancreatic ductal adenocarcinoma. They were identified on the base of mixed and normalized three different microarray datasets. Just as other researchers have used bioinformatics to explore potential oncogenes, we demonstrate the value of these two molecules for further exploration and verification from different perspectives. Results revealed that NUSAP1 and SHCBP1 were worthy of further experimental verification of their mechanism in pancreatic tumors and are likely to be a key factor in the prognosis and treatment of pancreatic cancer especially pancreatic ductal adenocarcinoma.

Declarations
Ethics approval and consent to participateNot applicable.Our manuscript does not report on or involve the use of any animal or human data or tissue, this section is not applicable to our submission.

Consent for publication: Individual person's data contained in our manuscript is shared by GEO
which is opened to publication.
Availability of data and material: Cytoscape (version 3.6.1) software/ R language(version 3.6.1) is used in this manuscript

Competing interests:
The authors declare that they have no competing interests.

Authors' contributions:
The authors would like to thank his tutors, professor YD and professor SC, for their conception and design of experiments. Moreover, they also drafted the work and revised it critically for important content. The authors also would like to thank DQ for statistical analysis.
Meanwhile, the analysis tools was provided by JP and ZW. The figures and tables was prepared by CY.
All authors have read and approved the manuscript.  Tables   Table I Details Abbreviation: DEGs, differentially expressed genes Volcano.Volcano plot of gene expression profile data in pancreatic ductal adenocarcinoma tissues and the normal pancreatic tissues. The red, green, and gray points represent upregulated genes, downregulated genes, and none differentially expressed genes, respectively.    The hub genes. The nodes meant proteins; the edges meant the interaction of proteins; green meant down-regulated DEGs and red meant up-regulated DEGs.