Bioinformatics analysis to screen and identify key biomarkers of papillary renal cell carcinoma

： Background: Papillary renal cell carcinoma (PRCC) is the second most common type of renal cell carcinoma after clear cell renal cell carcinoma(ccRCC). Its pathological classification is controversial and its molecular mechanism is poorly understood. Therefore, the identification of key genes and their biological pathways is of great significance to elucidate the molecular mechanisms of PRCC occurrence and progression. Methods: Downloaded PRCC-related datasets GSE7023, GSE48352 and GSE15641 from the Gene Expression Omnibus (GEO) database. Differential expression genes (DEG) were identified and gene ontology (GO) term enrichment analysis and Kyoto Encyclopedia of Genes and Genomics (KEGG) pathway analysis were performed. Cytoscape and STRING are used to construct the Protein-Protein Interaction Network (PPI) and module analysis to find hub genes and key pathways. Hierarchical clustering of hub genes was constructed using UCSC. Overall survival and relapse-free survival of Hub genes were analyzed using Kaplan-Meier plotter. UALCAN was applied to analyze genes expression in primary organizations, different stages ,different subtypes and races. Results: A total of 214 DEG genes were identified, including 205 down-regulated genes and 9 up-regulated genes. DEG is concentrated in angiogenesis, kidney development, oxidation-reduction process, metabolic pathways, etc. 17 hub gene enrichment mainly in angiogenesis, cell adhesion, platelet degranulation, Leukocyte transendothelial migration biological processes, etc. 17 hub genes were screened out, which were mainly enriched in the biological processes of angiogenesis, cell adhesion, platelet degranulation, and leukocyte transendothelial migration. Survival analysis showed that EGF, KDR, CXCL12, REN, PECAM1, CDH5, THY1, WT1, PLAU and DCN may be related to the carcinogenesis, metastasis or recurrence of PRCC. Conclusions: DEG and hub genes identified in present study provide clues to the specific molecular mechanisms of PRCC occurrence and development, and may be potential molecular markers and therapeutic targets for accurate classification and efficient diagnosis and treatment of PRCC. found that fluid shear stress could induce significant up-regulation of PLAU gene and increase of urokinase activity,thereby promoting proliferation, invasivity and chemotherapeutic resistance of breast cancer cells. [15] Survival analysis showed that changes in PECAM1 and PLAU were significantly associated with overall survival and recurrence - free survival in PRCC patients. Oncomine carcinogenesis analysis showed that PECAM1 and PLAU were down-regulated in PRCC in different datasets. UALCAN analysis showed that low expression of PECAM1 and PLAU in PRCC tissues was associated with different stages, subtypes, and races. Notably, PECAM1 was most significantly down-regulated in stage 1 Caucasian PRCC type 1 patients, while PLAU was most significantly down-regulated in stage 4 CIMP Type Asian PRCC patients. These results suggest that PECAM1 and PLAU play a critical role in the typing and efficient diagnosis and treatment of PRCC. Other hub genes have also been reported in various cancers. Low expression of ALB is a useful and independent prognostic biomarker in patients with advanced RCC. [16] EGF can promote cell migration of cRCC. [17] KDR polymorphism is associated with survival time in patients with advanced gastric cancer. [18] The CXCL12/CXCR4 axis promotes the development, invasion and metastasis of pancreatic cancer by managing the tumor microenvironment. [19] Abnormal (pro)REN receptor expression promoted the occurrence of CRC through the Wnt/ catenin signaling pathway. [20] A cohort study confirms PLG as a biomarker for the diagnosis of ductal adenocarcinoma of the pancreas (PDAC). [21] The overexpression of KNG1 inhibited the proliferation of glioma cells and induced their apoptosis. [22] Cadherin 5 (CDH5) is expressed in various malignant tumor cells such as gastric cancer and plays a role in cell adhesion. [23] C3 production of cancer cells in CSF is associated with clinical processes such as metastasis and recurrence. [24] In epithelial ovarian cancer (EOC), increased THY1 expression is associated with increased proliferation and self-renewal

lowest. [1] Because the specific molecular pathogenesis of PRCC is poorly understood, there has been no effective targeted therapy for papillary carcinoma in the past. There is increasing evidence that abnormal gene expression and mutation are related to the development of PRCC. Linehan et al. [2] found through clinical studies that the occurrence of PRCC type 2 was closely related to the mutations of chromatin modification genes SETD2, BAP1 and PBRM1, while MET mutations in the structural domain of tyrosine kinase were related to PRCC type 1. Hereditary leiomyomatosis and renal cell carcinoma (HLRCC) (which is an aggressive form of PRCC Type 2 ), the mutation of Fumarate Hydratase (FH) located on chromosome 1 is relatively common. [3] However, there is currently no standard treatment for the disease, and mortality rates for PRCC patients remain high. Therefore, understanding the exact molecular mechanisms involved in the carcinogenesis, metastasis, and recurrence of PRCC and formulating effective diagnosis and treatment strategies are of vital importance to improving the survival rate of patients.
Microarray technology and bioinformatics analysis as a screening and identification of disease development of differentially expressed genes (DEG) one of the important method and widely used, has help us to explore a wide variety of DEG involved in PRCC canceration and progress. However, the false positive rate in single microarray analysis is relatively high and it is difficult to obtain reliable results. Therefore, in this study, 3 mRNA microarray datasets from Gene Expression Omnibus (GEO) were downloaded and analyzed to obtain DEG between PRCC and normal tissues. Subsequently,Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomics (KEGG) pathway enrichment analysis and protein-protein interaction (PPI) network were used for analysis to help us understand the key genes and pathways for carcinogenesis and progression. As a result, a total of 214 DEG and 17 hub genes were selected, which may be potential molecular targets and biomarkers of PRCC.

Materials and methods
Acquisition of microarray data GEO (http://www.ncbi.nlm.nih.gov/geo) in the National Center for Biotechnology Information (NCBI) is a public functional genomics database,which is used to store gene expression datasets,platform information and original series.Three gene expression datasets [GSE7023 (4) , GSE48352 and GSE15641 (5)

Screening of DEG
Differential expression analysis was performed for each dataset using the GEO2R online analysis provided by the GEO database. The screening criteria for DEG between PRCC and non-cancerous samples were |logFC (fold change) |> 1 and adjusted P value <0.05. Based on the annotation information in the platform, the probes were converted into the corresponding gene symbol. Remove probe sets without corresponding gene symbols and duplicate data. Then, take the intersection of three data sets to determine the common DEG, using online tools Venn Diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/) to draw the Venn Diagram of DEG.

PPI network construction and module analysis
The Search Tool for the Retrieval of Interacting Genes online database (STRING; Https://string-db.org (version 11.0) built DEG's PPI network, and the analysis results were visualized by Cytoscape (version 3.6.1) and an interaction with a combined score >0.4 was considered statistically significant. The Molecular Complex Detection (MCODE) (Version 1.5.1) plug-in in Cytoscape was used to identify the most important modules in the PPI network. Selection criteria are as follows:Selection criteria are as follows: Degree Cutoff=2, Node Score Cutoff=0.2, K-Core=2 and Max depth=100. The genes in the module were then analyzed by KEGG and GO using DAVID and KOBAS 3.0.

Selection and analysis of hub genes
Through degree algorithm, 17 PPI genes with scores greater than 15 on cytoHubba in Cytoscape were

Identification of DEG in PRCC
After the microarray results were standardized, DEGs in three datasets were identified (906 in GSE7023, 823in GSE48352, and 2051 in GSE15641) (Fig. 1).The overlap between the three datasets contains 214 genes, as shown in the Venn diagram (Fig. 2a).It is composed of 9 up-regulated genes and 205 down-regulated genes between PRCC and normal tissues.

PPI network construction and module analysis
The PPI network of DEG was constructed through STRING and Cytoscape, and the most significant module was obtained which consists of 10 nodes 43 edges. (Fig. 2b,2c)DAVID and KOBAS 3.0 were used to analyze the genes involved in this module by functional enrichment. Results showed that the genes of the module mainly enriched in angiogenesis, cell adhesion, platelet degranulation, Leukocyte transendothelial migration, Pathways in cancer, PI3K -Akt signaling pathway and so on. (Fig. 5,Fig. 6) Hierarchical clustering analysis showed that hub genes could distinguish PRCC samples from normal samples. (Fig. 8)Hub genes were analyzed for overall survival and recurrence-free survival using Kaplan-Meier Plotter. PRCC patients with changes in EGF, KDR, CXCL12, REN, PECAM1, CDH5, THY1, WT1, PLAU, and DCN showed poorer overall survival and recurrence-free survival. However, patients with KNG1-altered PRCC had poor overall survival while no significant difference in recurrence-free survival. (Fig. 9)Among these genes with significant differences in overall survival and recurrence-free survival, PECAM1 and PLAU were shown as seed genes in the significant modules screened by MCODE, suggesting that they might play a crucial role in the carcinogenesis and progression of PRCC. The Oncomine analysis of the carcinogenesis of PRCC and normal tissues showed that PECAM1 and PLAU were down-regulated in PRCC in different datasets. (Fig.   10)UALCAN analyzed the two genes expression in the PRCC tissue and normal tissue in primary organizations, different stages and subtypes and races. As shown in Fig. 11, PECAM1 and PLAU were down-regulated in PRCC tissue compared with normal kidney tissue, which was consistent with the analysis results of UCSC and Oncomine. Moreover, the expression of both genes in PRCC carcinoma tissues was related to different stages, subtypes and races. Notably, PECAM1 was most significantly down-regulated in stage 1 Caucasian PRCC type 1 patients, while PLAU was most significantly down-regulated in stage 4 CIMP Type Asian PRCC patients. Fig. 7 The biological process analysis of hub genes was constructed using BiNGO. The color depth of nodes refers to the corrected P-value of ontologies. The size of nodes refers to the numbers of genes that are involved in the ontologies. P<0.01 was considered statistically significant

Discussion
PRCC is the second most common subtype of renal cell carcinoma (RCC) after ccRCC, accounting for approximately 15% of renal tumors, and is receiving increasing attention worldwide due to its increasing incidence. The pathological classification of PRCC is controversial. Recent study has shown that molecular markers can help to pinpoint different subtypes of PRCC and are of great significance for the development of targeted drugs and better patient stratification. [6] However, the potential molecular pathogenesis of PRCC is poorly understood, and there is no effective targeted therapy for papillary carcinoma, so the mortality rate of patients with PRCC remains high. Therefore, it is urgent to explore potential molecular markers for accurate typing and efficient diagnosis and treatment of PRCC.
Microarray technology and bioinformatics analysis have been widely used as effective methods to screen and identify novel biomarkers in various diseases and have helped us to identify a variety of DEG involved in the carcinogenesis and progression of PRCC. In this study, three mRNA chip datasets from the GEO database were analyzed to obtain DEG between PRCC tissue and normal tissue. A total of 214 DEG genes were screened out, including 205 down-regulated genes and 9 up-regulated genes.
To explore the interaction between DEG, GO and KEGG enrichment analyses were performed. The results showed that DEG were mainly concentrated in kidney development, angiogenesis, oxidation-reduction process and metabolic pathways. Previous reports have shown that pathological angiogenesis is important for the growth and spread of cancer by providing nutrients and oxygen and providing catheters for distant metastases. [7] Oxidation-reduction enzyme activity encodes tumor suppressors and plays an important role in tumor antioxidant reactions. [8] In addition, increased metabolic levels directly caused by gene mutations and modifications of cancer-related protein expression can promote the occurrence and development of cancer. [9] In a word, all these theories are consistent with our analysis. As for the most significant module, GO analysis mainly focused on angiogenesis, cell adhesion, cell migration, epithelial to mesenchymal transition, negative regulation of the intrinsic apoptotic signaling pathway in response to DNA damage. And KEGG changes are mainly distributed in leukocyte transendothelial migration, pathways in cancer, Rap1 signaling pathway, PI3K -Akt signaling pathway.
We defined 17 genes with degree scores greater than 15 as hub genes and analyzed their survival.
Among the genes with significant differences in overall survival and recurrence-free survival analyses, PECAM1 and PLAU showed seed genes in the significant modules screened by MCODE, indicating that they may play a crucial role in the carcinogenesis or progression of PRCC. Platelet and endothelial cell adhesion molecule 1 (PECAM1, also known as CD31) encodes a protein associated with angiogenesis and extracellular circulation that is involved in tumor growth and spread. [10] Study has shown that in patients with ccRCC, the expression of PECAM1 is increased, which is associated with mild clinicopathological parameters, and it is significantly increased only in the early stage of the disease. [11] However, our analysis showed that PECAM1 expression was down-regulated in stage 1 patients with PRCC type I. Therefore, PECAM1 may be an early diagnostic indicator to distinguish ccRCC from PRCC. In addition,Terashima M et al. found that in stage II/III gastric cancer patients, the level of PECAM1 in the primary tumor was associated with a high risk of hemangiogenic metastasis. [12] The other study suggests that resistin affects the expression of PECAM1 by relying on CAP1, thus inducing EMT and stemness to promote the metastatic potential of breast cancer cells. [13] PLAU encodes a serine protease involved in degradation of the extracellular matrix and possibly tumor cell migration and proliferation. Study shows that miR-193a-3p may in part inhibit the growth, migration and angiogenesis of colorectal cancer (CRC) cells by targeting PLAU. [14] Novak CM et al.
found that fluid shear stress could induce significant up-regulation of PLAU gene and increase of urokinase activity,thereby promoting proliferation, invasivity and chemotherapeutic resistance of breast cancer cells. [15] Survival analysis showed that changes in PECAM1 and PLAU were significantly associated with overall survival and recurrence -free survival in PRCC patients. Oncomine carcinogenesis analysis showed that PECAM1 and PLAU were down-regulated in PRCC in different datasets. UALCAN analysis showed that low expression of PECAM1 and PLAU in PRCC tissues was associated with different stages, subtypes, and races. Notably, PECAM1 was most significantly down-regulated in stage 1 Caucasian PRCC type 1 patients, while PLAU was most significantly down-regulated in stage 4 CIMP Type Asian PRCC patients. These results suggest that PECAM1 and PLAU play a critical role in the typing and efficient diagnosis and treatment of PRCC.
Other hub genes have also been reported in various cancers. Low expression of ALB is a useful and independent prognostic biomarker in patients with advanced RCC. [16] EGF can promote cell migration of cRCC. [17] KDR polymorphism is associated with survival time in patients with advanced gastric cancer. [18] The CXCL12/CXCR4 axis promotes the development, invasion and metastasis of pancreatic cancer by managing the tumor microenvironment. [19] Abnormal (pro)REN receptor expression promoted the occurrence of CRC through the Wnt/ catenin signaling pathway. [20] A cohort study confirms PLG as a biomarker for the diagnosis of ductal adenocarcinoma of the pancreas (PDAC). [21] The overexpression of KNG1 inhibited the proliferation of glioma cells and induced their apoptosis. [22] Cadherin 5 (CDH5) is expressed in various malignant tumor cells such as gastric cancer and plays a role in cell adhesion. [23] C3 production of cancer cells in CSF is associated with clinical processes such as metastasis and recurrence. [24] In epithelial ovarian cancer (EOC), increased THY1 expression is associated with increased proliferation and self-renewal capacity of cancer cells. [25] The n-glycosylation pattern of MGAM may be related to the progression of bladder cancer. [26] AGTR1 can inhibit tumor cell metastasis in melanoma. [27] DCN can promote apoptosis of CRC cells and play an anti-tumor role. [28] Hierarchical cluster analysis of hub genes showed that these genes could distinguish PRCC samples from normal samples and might be potential biomarkers for the diagnosis of PRCC. In addition, changes in EGF, KDR, CXCL12, REN, PECAM1, CDH5, THY1, WT1, PLAU, and DCN were associated with poorer overall survival and recurrence-free survival, suggesting that these genes may play an important role in the carcinogenesis, progression, or recurrence of PRCC.
The limitations of this study are as follows: First, the number of data samples we used is small and downloaded from the GEO database, whose authenticity and reliability need to be considered. Secondly, despite a series of functional annotation and enrichment analysis of DEG, this study did not delve into the detailed mechanism of action between PRCC and DEG. Furthermore, the hub genes screened from the PPI network should be further validated in vitro to observe their specific role in PRCC.
In summary, the purpose of this study is to identify potential molecular markers for precise typing and efficient diagnosis and treatment of PRCC. A total of 214 DEG and 17 hub genes are identified.
However, the specific biological functions and molecular regulatory mechanisms of these genes in PRCC still need to be confirmed by further studies.

Declarations
Funding：This work was no fund supported.
Ethics approval and consent to participate:Not applicable.
Competing interests:The authors declare no conflict of interest.