Genome-wide profiling identification of prognostic novel signatures in papillary renal cell carcinoma based on large-scale sequencing data

Background : This study aims to identify potential biomarkers with prognostic value in papillary renal cell carcinoma (PRCC) by combining protein interaction networks with gene expression profiles from multiply cohorts. Methods : Two microarray datasets were downloaded from the Gene Expression Omnibus (GEO) database and differentially expressed genes (DEGs) were identified based on standardized labeling information. The protein-protein interaction network (PPI) and functional annotations of DEGs was established and the modules were analyzed by using STRING and Cytoscape. Survival analysis of significant DEGs was analyzed by Kaplan-Meier comprehensive expression score in the Cancer Genome Atlas (TCGA) cohort. Receiver operating characteristic (ROC) curves were constructed to describe binary classifier value of genes using area under the curve (AUC) score. Additionally, immunohistochemical staining of PTTG1 protein was performed, and survival analysis was validated in Fudan University Shanghai Cancer Center (FUSCC) cohort. Results : A total of 473 DEGs and 38 functionally related hub genes were identified as candidate prognostic biomarkers. Eight genes, including BUB1B , CCNB1 , CCNB2 , MAD2L1 , TTK , CDC20 , PTTG1 and MCM, enrichment with cell cycle process were identified for further analysis. Statistical analysis of TCGA cohort indicates that expression level of the eight genes was higher in PRCC tumor tissues and was negatively correlated with patient's outcome. Significantly elevated PTTG1 expression and its negative correlativity with patient's outcome was validated in FUSCC cohorts. Conclusions : Expression levels of eight hub genes have strong prognostic value and may help better understand the potential carcinogenesis and develop targeted therapy strategies for PRCC.

out. The protein-protein interaction (PPI) network reveals the specific functions of all proteins and describes the importance of these interactions in biological processes, molecular functions, and signal transduction.
To determine the candidate biomarkers and their possible role in PRCC, this work focused on analyzing gene expression profiles, reveal potential biointeraction networks, and assessing prognostic value. We speculated that the oncogenic activity of important hub genes was associated with poor prognosis, which may be a potential therapeutic target for PRCC.

Materials And Methods
Original biological microarray data Gene Expression Omnibus (GEO) is a public functional genomic database that stores high throughout gene expression data, chips and microarrays. The Original DNA microarray data were obtained from GEO, for patients with PRCC. Two chip datasets GSE48352, GSE26574 were downloaded from GEO (Affymetrix GPL16311 platform, Affymetrix GPL11433 platform, respectively). The corresponding genes transformed into a probe were converted into a symbol according to the annotation information on the platform.

Screening and identification of DEGs
The DEGs between PRCC and non-cancerous samples were screened and identified by GEO2R. Delineating parameters such as P-values, Benjamini and Hochberg false discovery rate (FDR) and fold change were used to filter DEGs and applied to provide a balance between the discovery of statistically important genes and the limitations of false positives. A probe set that does not have a corresponding gene symbol or a gene that has multiple probe sets is knocked out or averaged. logFC (fold change) > = 1 or <=-1 and Pvalue < 0.01 were considered statistically significant.
In this study, we used Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org) (version 10.0) online database to predict PPI networks of DEGs and analyze functional interactions between proteins [16]. This may help to further understand the underlying mechanisms of the development and progression of PRCC. Cytoscape (version 3.5), an open source bioinformatics software platform, was designed to visualize molecular interaction networks [17]. Cytoscape's plug-in Molecular Complex Detection (MCODE) (version 1.4.2) can cluster a given network based on topology to find dense connection areas [18]. The most important modules in the PPI network are selected as follows: MCODE Score > 24. Subsequently, KEGG and GO analysis of the genes in the module was performed using Database for Annotation, Visualization and Integrated Discovery (DAVID; http://david.ncifcrf.gov; Version 6.8) [19].

Hub genes selection and analysis
After applying the MCODE plug-in, a network of the 38 genes and their co-expression genes was analyzed using cBioPortal (http://www.cbioportal.org) online platform [20].
ClueGO, a Cytoscape plug-in that could visualize the non-redundant biological terms for large clusters of genes in a functionally grouped network [21]. The biological processes from GO and KEGG pathway analysis of hub genes were visualized using ClueGO (version  [22,23]. Heat map, based on hierarchical clustering algorithm, was constructed using the phenotype and gene expression profile of 323 samples in TCGA.

Functional enrichment of DEGs
Biological properties such as biological processes (BP), molecular functions (MF), and cellular components (CC) were extracted from gene ontology (GO) enrichment analysis to determine the role of DEGs in PRCC. Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource for understanding high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technologies.
DAVID was applied to discern the role of development-related signaling pathways in PRCC.
P-values < 0.05 are considered statistically significant. GO and KEGG enrichment was analyzed and displayed using bubble chart.

Statistical analysis of TCGA cohort
Phenotype and expression profiles of hub genes in 323 PRCC patients from TCGA were analyzed and displayed to predict prognostic value. The expression levels of each gene were divided into three groups: high, medium and low. Subsequently, to compare the outcome of high group and low group, the univariate survival analysis of the 8 hub genes was performed using Kaplan-Meier curve. Univariate and multivariate analyses were performed with Cox logistic regression models to find independent variables, including age

Immunohistochemical (IHC) staining and evaluation
Rabbit anti-PTTG1 monoclonal antibody was used (ab128040, Abcam, USA). The positive or negative staining of a certain protein in a FFPE slide was independently evaluated by two experienced pathologists and determined as follows. The overall IHC score from 0 to 12 was evaluated according to the multiply of the staining intensity and extent score, as previously described [25].
Statistical analysis of FUSCC cohort and potential networks of the hub genes According to the IHC score, the patients were divided into two groups: high expression group and low expression group of PTTG1. Correlation analyses between the expression of PTTG1 and clinicopathological features were carried out by chi-square test. In order to explore the factors related to prognosis, age at surgery (ref.

Results
This study is divided into three phases. In the first phase, we used the information in GEO platform to evaluate DEGs; in the second phase, we constructed a PPI network and evaluated their interactions specificity based on co-expression and functional annotation.
In the third phase, phenotype and expression profiles of hub genes in 323 PRCC patients from TCGA were analyzed and displayed to predict prognostic value.

Identification of DEGs in PRCC
After normalizing the microarray results, we identified DEGs, including 1,270 probes in GSE48352, 826 probes in GSE26574. As shown in the Venn diagram, the overlap in both two datasets contained 473 differentially expressed genes between tumor tissues and adjacent normal tissues (Fig. 1A).
PPI network construction and module analysis PPI network of DEGs was built (Fig. 1B) and the most important modules were identified using Cytoscape plugin (Fig. 1C). Functional analysis of the 38 genes involved in the module was performed using DAVID. The functional analyses of 38 genes involved in this module were analyzed using DAVID. Enrichment profiles showed that hub genes in this module were primarily enriched in sister chromatid segregation (49.37%), mitotic nuclear division (13.92%), anaphase-promoting complex-dependent catabolic process (11.39%; The functional enrichment results of 38 DEGs are listed in Table 1, modules with both P value < 0.05 and FDR < 0.1 were considered significant.   (Table 2). Cox regression analyses of FUSCC cohorts and potential networks of the hub genes In univariate Cox regression analysis models, pTNM stage, AJCC stage, and Furhman grade, tumor size, BMI, were significantly relevant to PFS (p < 0.05; Table 3) and OS (p < 0.05; Table 3). Importantly, PTTG1 expression markedly correlated with poor PFS (HR = 2.46, p < 0.01) and poor OS (HR = 2.90, p < 0.01). The potential networks associated with functional annotations of the hub genes include inhibition, protein interaction, ubiquitination, phosphorylation, activating expression, activation, indirect relation, dephosphorylation (Fig. 7).

Discussion
The concept of PRCC was first proposed by Mancilla-Jimenez in 1976, thirty-four cases of RCC showed papillary structures. Of these, 85.3% PRCC patient have a better prognosis than other types of RCC [26]. Since PRCC is relatively rare in clinical practice and has been rarely studied, major molecular mechanisms in the pathogenesis are poorly understood.
Therefore, potential biomarkers for efficient diagnosis and treatment are urgently needed.
In this study A total of 473 DEGs and 38 hub genes were identified by microarray data analysis. Among the 38 hub genes, 8 genes relating to cell cycle including BUB1B, CCNB1, CCNB2, MAD2L1, TTK, CDC20, PTTG1, MCM5 were subjectively selected. After statistical analysis, the 8 genes showed clear prognostic value.
BUB1B (spindle detection point protein, also known as BUBR1) is an important functional protein at the detection point of mitosis and the change of BUB1B expression plays an important role in tumorigenesis and progression [27]. Studies have found that BUB1B is overexpressed in kidney cancer and breast cancer, its mutation and overexpression is strongly correlated with Chromosomal instability [27,28]. Yet the prognostic value of BUB1B in PRCC has been rarely reported.
As a member of the cell cycle family, CCNB1 is one of the key factors related to cell detection points [29,30]. Currently, Cyclin B1 overexpression has been found in a variety of human tumors, such as esophageal cancer, non-small cell lung cancer, tongue cancer, and is related to tumor grade, differentiation, invasion and metastasis and prognosis [31].
Thus, there is enough evidence to doubt the role of CCNB1 in human PRCC as an oncogene.
It has been reported that CCNB2 is highly expressed in tumor tissues, such as breast cancer [32], adrenal cortical carcinoma [33], colorectal adenocarcinoma [34]and pituitary adenoma [35]. It has also been reported that serum circulating CCNB2 mRNA level in cancer patients is significantly higher than that in normal population and benign diseases [36].  [38]. In order to maintain the division and proliferation of tumor cells, TTK was highly expressed in tumor cells to maintain the normal function of SAC. After inhibiting the function of TTK, SAC is damaged, errors in mitotic metaphase cannot be detected, chromosomes cannot be separated into daughter cells on average, and heteroploidy is further increased, exceeding a certain threshold will cause tumor cell apoptosis, so TTK can serve as an effective anti-tumor target [39,40].
Multiple studies have shown that CDC20 could degrade several important substrate factors to regulate cell cycle progression including Securin [41], Cyclin A [42,43], p21 [44] and Availability of data and material: The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request.

Conflict of interests:
The authors declare no competing interests.