Bioinformatics Analysis of C3 and CXCR4 act as Potential Prognostic Biomarkers in Clear Cell Renal Cell Carcinoma (ccRCC)

Background: The molecular pathogenesis of ccRCC was still unknown. Hence, the ccRCC-associated genes needs to explored. Methods: Three ccRCC expression microarray datasets (GSE14762, GSE66270 and GSE53757) downloaded from gene expression omnibus (GEO) database. The distinguish of expressed genes (DEGs) between ccRCC and normal tissue was discuss and explored. the function of our DEGs was analyzed by Gene Ontology (GO) ,Kyoto Encyclopedia of Genes and Genomes (KEGG) .Then the protein ‐ protein interaction network (PPI) was established in order to screen the hub genes. Then the expressions of hub genes were identied by oncomine database.The prognostic values of hub genes were analyzed by GEPIA database in ccRCC patients. Result: A total of 137 DREs were analyzed, which including 63 upregulated genes and 74 downregulated genes. According to our result,137 DREs were mainly enriched in 82 functional terms and 24 pathways. 14 highest-scoring genes were screened as hub gene in the PPI network which including 12 upregulated candidate genes and 2 downregulated candidate genes. The result reveals that patients with higher C3 expression related to poor OS, while patients with high expression of CTSS and TLR3 related to better OS. Patients with high C3 and CXCR4 expression had a poor DFS, while ccRCC patients with high expression of TLR3 had better DFS. At last, C3 and CXCR4 were selected to detect the prognosis of patients with ccRCC. Conclusion: The result identied the C3 and CXCR4 as candidate biomarkers and potential therapeutic targets in the molecular mechanism and individual treatment of ccRCC.


Introduction
Renal cell carcinoma (RCC) is the most common kidney malignancies, which originates in the renal tubular epithelium [1]. Among of them, ccRCC was the most important histological type, accounting for ∼80% of kidney cancers [2]. The vast majority of kidney cancers are discovered by accident. Less than 5% of renal cancers are detected by the classic triad (gross hematuria, ank pain and abdominal mass) and are often advanced. Due to resistant to radiotherapy and chemotherapy, surgical resection is still the best treatment for renal cancer [2]. Although the emergence of immunotherapy and targeted therapy has diversi ed the treatment of renal cancer, the prognosis of patients with renal cancer who have lost the opportunity of surgery is extremely poor [3]. Therefore, it is particularly important to understand the pathogenesis of renal cancer and to explore biomarkers which could support early diagnosis and prediction of prognosis.
In recent years, bioinformatics analysis of gene expression microarrays as an new method for identifying the potential target genes of diseases and providing the molecular characteristics, regulatory pathways and cellular networks of diseases [4]. The gene expression omnibus (GEO, www.ncbi.nlm.nih.gov/geo/) database is an international public functional genomics database that stores common array and sequence data. In the past decades, more and more studies had reported that tumor-related genes have been discovered by mining GEO databases. For instance, Guo et al found that 31 mostly changed hub genes were signi cant enriched in several pathways by integrated bioinformatical analysis, which mainly associated with cell cycle process, chemokines and G-protein coupled receptor signaling pathways in colorectal cancer [5]. Besides, Liang's results indicated that BCL2, CCND1 and COL1A1 may be key genes in thyroid papillary carcinoma through bioinformatics analysis [6].
Based on the above theories, we excavated and analysis the ccRCC gene expression pro le from the GEO database, and then further analyzed the data to nd valuable hub genes for the next research.

Access to public resources
Three expression pro ling datasets (GSE14762 [7], GSE66270[8] and GSE53757 [9]) associated ccRCC are acquired from the GEO database. The GSE14762 dataset included 11 tumor tissue samples and matched normal tissue samples. GSE66270 dataset included 14 normal tissue samples and 14 tumor tissue samples. GSE53757 dataset included 72 tumor tissue samples and adjacent tissue samples. There is microarray data of GSE14762 was underlying GPL4866 Platforms. The microarray data of GSE66270 and GSE53757 were underlying GPL570 Platforms. Platform and series matrix les were downloaded as TXT les. Details for GEO ccRCC data were shown in Table 1.

Gene ontology (GO) and KEGG Enrichment Analyses
To explore the biological processes (BP), molecular functions (MF), and cellular components (CC) of DEGs, two online biological tools were applied. Online website g:Pro ler (https://biit.cs.ut.ee/gpro ler/gost) was used for Go analysis. And DAVID 6.8 (https://david.ncifcrf.gov/) was used for KEGG analysis. P < 0.05 was considered as the threshold for Go and KEGG pathway analysis.

PPI network construction
Online database STRING (http://string-db.org) and Cytoscape software (Version 3.6.1, http://www.cytoscape.org/) were applied to generate the PPI network of DREs and nd hub genes. In addition, the Molecular Complex Detection (MCODE) plug-in in cytoscape software was used to analyze clustered sub-networks of highly intraconnected nodes from the above PPI network. The default parameters of MCODE plug-in were as follows: Degree cutoff ≥ 2, Node score cutoff ≥ 0.2, K-core ≥ 2, and Max depth = 100.

Expression and survival analysis of hub genes
To better validate the expression level of hub genes, the meta-analysis function on the oncomine database (https://www.oncomine.org/) was used. Besides, Online database GEPIA (http://gepia.cancer-pku.cn/detail.php) is interactive web server for analyzing the expression of tumor and normal gene. In this study, it was applied to analyze the relationship between the expression and survival analysis (include overall survival (OS) and disease free survival (DFS)) of hub genes.

Microarray data information
The RCC expression microarray datasets GSE14762, GSE66270 and GSE53757 were standardized by RMA algorithm, and the results were displayed in Fig. 1.
By screening criteria (P < 0.05 and [log2 FC] ≥ 2), 381 DREs were obtained from GSE14762, 870 DREs were obtained from GSE66270, and 1324 DREs were obtained from GSE53757. The DREs from the two groups of sample data included in each of the three databases were shown by volcano plot (Fig. 2). The cluster heatmaps of the top 100 DEGs from the three microarrays were displayed in Fig. 3.

Identi cation of DEGs in ccRCC
The three microarray databases of kidney cancer were analyzed and sorted by the limma package (threshold: P < 0.05 and [log2 FC] ≥ 2), and then further analyzed by RRA method. As a result, 137 DREs were identi ed, contained 63 overexpressed genes and 74 under-expressed genes ( Table 2). Figure 4 revealed the heatmap of the top 20 overexpressed and under-expressed genes by R-heatmap software. 3.3 Go and KEGG analysis of DREs DAVID database was performed to further understand the function of hub genes, including BP, CC and MF. Signi cant results of the GO enrichment analysis of DEGs in ccRCC are shown in Tables 3. As shown in Fig. 5A and 5B, Go analysis (threshold: P < 0.05 and count ≥ 2) demonstrated that hub genes of ccRCC were mainly enriched in 50 terms in BP group, such as response to hypoxia, oxidation-reduction process and proteolysis. In CC group, DREs were enriched in 21 terms, such as extracellular exosome, plasma membrane and integral component of membrane. Similarly in MF group, DREs were enriched in 11 terms, such as identical protein binding, receptor binding and heparin binding. Shown in Fig. 5C, the result illustrated the relationship between the different functions by cytoscape software.   .004463  KNG1, ENPP6, FGF9, C3, APOC1, UMOD, CTSS, PLG, C1QA, C1QB, AFM, SOST, TNFSF13B,  SFRP1, SERPINA5, TGFBI, HRG, FGF1, TREM2, CASP1, IGFBP3, ANGPTL4   GO:0009986 ~ cell  surface   11  0.005205  SFRP1, CXCR4, LGALS1, TLR2, SLC34A1, FCER1G, TLR3, HILPDA, ITGB2,  To further analyze the above DREs, the signi cantly enriched pathways were submitted to KEGG analysis. Table 4 and Fig. 5D indicated the signi cant pathway enrichment of DEGs by KEGG analysis. These DEGs were enriched in 24 pathways, which mainly related to metabolic pathways, phagosome and other pathways are involved.

PPI network and module analysis
String database was used to generate PPI networks of DREs in RCC. Figure 6A showed the relationship between 137 candidate hub genes. Besides, MCODE application was applied to screen out the highest-scoring nodes. And Fig. 6B displayed the module with the highest score (score = 10, node = 11, edges = 50). As a result, MCODE application select the 14 nodes with the highest score, including 12 upregulated candidate genes(C1QA, C1QB, C3, CTSS, CXCR4, FCER1G, ITGB2, TLR2, TLR3 and TYROBP) and 2 downregulated candidate genes(AQP2, PLG).

Expression and survival analysis of hub genes
To further explore the expression and prognosis of the above screened genes, the oncomine database and GEPIA database were applied. Six analyses were obtained from the oncomine database (Fig. 7). The result of meta-analysis suggested that the expression of 10 genes was signi cant (P < 0.05). Figure 8 indicated the OS and DFS of 10 genes. And the result demonstrated that ccRCC patients with high C3 expression had a poor OS, while ccRCC patients with high expression of CTSS and TLR3 had a good OS. Besides, in ccRCC patients, high C3 and CXCR4 expression indicated a poor DFS, while high expression of TLR3 indicated a good DFS. Finally, C3 and CXCR4 were selected to detect the prognosis of patients with ccRCC.

Discussion
Kidney cancer accounts for about 2-3% of adult malignant tumors, and 80-90% of adult renal malignancies. In the Worldwide, kidney cancer ranks 13th among common tumors. In 2012, about 338 000 cases of kidney cancer were newly discovered, accounting for 24% of all tumors, and the death toll was 144 000 cases, accounting for 17% of all tumors. [11] However, the National Cancer Registry conducted a long-term survey on the incidence and mortality of kidney cancer across the country and found that the incidence of kidney cancer in China was nearly to the world average and is increasing year by year. [12]Renal cell carcinoma (RCC) is the most common kidney malignancies. The early symptoms of RCC are not obviously, and most patients are diagnosed with advanced stage or metastasis. [13]RCC was characteristic of easy recurrence and metastasis due to the complexity of the causes and pathogenesis. Moreover, it was insensitivity to traditional chemoradiotherapy. Those reasons caused RCC usually leads to poor clinical outcomes. Hence, it is helpful to improve the diagnosis, treatment and prognosis of renal cancer by increasing the understanding of the biological molecular mechanism of renal cancer.
With develop of sequencing technology and bioinformatics, the collect and analyzed of previous data will support to explored the pathogenesis of RCC and discover possible diagnosis and treatment biomarkers. [14]Bioinformatics research to analyze biological data, propose on the development of related gene or group of disease, then performed experimental to prove the result, which is a highly e cient research pathway. In the present, Bioinformatics has been widely performed at all areas of medical research, the design of the discover disease-related genes, clinical diagnosis of disease, individualized treatment of diseases and new molecular targets for drug discovery. Bioinformatics was play very important role on those area [15].
In this study, we downloaded three chips from the GEO database. genes and ccRCC, the oncomine platform was used. As a result, 10 hub genes were signi cantly abnormally expressed by meta-analysis, and all of above genes were highly expressed in ccRCC. Based on the above result, we used the GEPIA platform to predict the prognosis of 10 hub genes. Finally, we found that ccRCC patients with high C3 expression had a poor OS, while ccRCC patients with high expression of CTSS and TLR3 had a better OS. Besides, in ccRCC patients, high C3 and CXCR4 expression indicated a poor DFS, while high expression of TLR3 indicated a good DFS.
As a protein coding gene, complement component 3 (C3) is involved in the occurrence and development of many diseases, including C3 de ciency, Autosomal Recessive and Hemolytic Uremic Syndrome, Atypical 5.
[16]Among its related pathways are Immune response Lectin induced complement pathway and Signaling by GPCR.And past reports was proved that C3 was a potential prognostic marker for non-small cell lung cancer and may be a new immune marker to judge the prognosis of patients with non-small cell lung cancer. Because of C3 expression was decreased signi cantly in stage III if you compared with stage I of NSCLC. [17,18] Our result reveals that C3 was increased express in RCC, and the increase the expression was related to worse prognosis.CTSS (Cathepsin S) is a Protein Coding gene, previous articles of papillary thyroid carcinoma reveals that CTSS was highly expressed and related to transformation.
[19]Those results reveals CTSS highly expression was associated with poor prognosis and lymph node metastasis. Hence, the CTSS contributes to the genes of oncogenes was justi ed. [20] However, we analyzed our data that CTSS was indeed highly expressed in RCC, but overexpression was associated with better prognosis. The prognosis of patients with highly expression was even better. which is on opposite effect between expression and prognosis. we consider of that might be degraded after binding to other target genes or inhibited in the subsequent functioning pathways which still needs Explore in the further. As a member of the Toll-like receptor (TLR) family, past studies has proved that TLR3 is abnormally expressed in a variety of tumors, including breast, ovarian and prostate tumors. But TLR3 was associated with either better or poor clinical result in various cancers. [21,22] Hence, we were as well analyzed TLR3 expression in RCC which we realized that it expression as CTSS. TLR3 was highly expression in RCC but it was related to result with better prognosis.
chemokine receptor-4 (CXCR4) is a gene which belongs to the super-family of the 7-transmembrane domain, heterotrimeric G-protein-coupled receptors and is associated with cell proliferation, migration, invasion and survival. In the previous reports, It has been demonstrated that CXCR4 is upregulated in sporadic Vestibular schwannomas (VS) as well as in neuro bromatosis type 2 (NF2) tumors. [23][24][25]According to our result that we explored CXCR4 was overexpressed in RCC as well as the higher expression related with poor prognosis.

Conclusion
In summary, through bioinformatics analysis, 2 ccRCC-associated candidate genes (C3 and CXCR4) were identi ed using three expression pro le datasets from the GEO database. Besides, CTSS and TLR3 are abnormally expressed in ccRCC and are associated with ccRCC prognosis. However, their expression is contrary to the prognosis. The above novel biomarkers may have important clinical signi cance for the diagnosis and prognosis of renal carcinoma, but the detailed mechanism of their action in the development of renal carcinoma needs to be further explored.

Declarations Data Available
The data used to support the ndings of this study are included within the article. The gene expression data can be accessed on Gene Expression Omnibus (GEO). Oncomine database was used to get the expression pro le of hub genes. The overall survival and disease free survival analysis of genes were acquired from GEPIA database. Figure 1 Standardization of gene expression by boxplot. The GSE14762 data (A), GSE66270 data (B) and GSE55757 data (C) was standardized.  Clustering heatmap of DEGs. Three gures show the heatmap of GSE14762 data (A), GSE66270 data (B) and GSE55757 data (C). Red grid shows that the expression of genes is uoverexpressed, green grid shows that the expression of genes is under-expressed, and black grid shows no signi cant difference; gray grid shows that genes are too weak to be detected..   The expression level of 13 hub genes among 6 different analysis datasets by the ONCOMINE database.