Identication of Novel Biomarkers and Targets for Natural Products in Inhibiting Prostate Cancer

Background: Prostate cancer (PCa) is a common urinary system malignancy. The lack of specic and sensitive biomarkers for the diagnosis and prognosis of PCa makes it important to seek alternatives. Meanwhile, targeted PCa inhibitors are limited. Natural products that potentially target PCa may offer a useful approach. Methods: Expression prole datasets about PCa from GEO were analyzed. Core differential genes were identied by String and Cytoscape. GEPIA and HPA were utilized to further validate the key genes. The targets of natural products were obtained from the Drugbank, Therapeutic Target Database, BindingDB, PubChem, and chEMBL databases, and PCa therapeutic targets were generated from the GeneCards, OMIM, and PharmGkb databases. Cytoscape was also used to screen the core modules and disease-drug targets. Construction of molecular docking models of drug-core targets was performed by Autodock to conrm the accuracy of the targets. Results: Four identied biomarkers, CENPF, TPX2, TK1 and CCNB1 were veried by HPA. Five novel PCa biomarkers, RRM2, UBE2C, TOP2A, BIRC5 and ZWINT were also identied. All the nine markers indicated poor prognosis for PCa patients were veried by GEPIA. PCa carcinogenesis is found to be mainly associated with hepatic brosis pathway, ILK signaling, NRF2-mediated oxidative stress response and many others. Four key PCa targets for curcumin (EP300, RELA, EGFR, NFKB1), seven for taxol (PTEN, EGFR, ERBB2, TP53, KRAS, AR, AKT1) and two for ursolic acid (GSK3B, RELA) were identied by Cytoscape combined KEGG and veried by Autodock. Conclusions: The novel identied biomarkers in our study would be valuable for the diagnosis and prognosis of PCa. Key targets of curcumin, paclitaxel, and ursolic acid in PCa could lay a solid foundation for precise treatment and molecularly targeted therapy for activation, ILK signaling, xenobiotic metabolism PXR signaling pathway, xenobiotic metabolism AHR signaling pathway, LPS/IL-1 mediated inhibition of RXR function, xenobiotic metabolism CAR signaling pathway, SPINK1 pancreatic cancer pathway, aryl hydrocarbon receptor signaling, NRF2-mediated oxidative stress response, MSP-RON signaling in cancer cells pathway, osteoarthritis pathway, agranulocyte adhesion and diapedesis, nicotine degradation II, human embryonic stem cell pluripotency, amyotrophic lateral sclerosis signaling, MSP-RON signaling pathway, WNT/β-catenin signaling, endocannabinoid chronic signaling pathway, non-small cell signaling pathway, thyroid


Background
The incidence of prostate cancer (PCa) is on the rise globally (1). The prevalence of PCa and associated deaths in China has steeply risen in recent years due to rising living standards, changing diets and an aging population (2). Prostate-speci c antigen (PSA) is the most common marker used to identify an increased risk of PCa, but PSA level is not a highly accurate indicator and can be in uenced by many other factors such as alcohol consumption and in ammation (3). According to the Chinese expert consensus on genomic testing of PCa patients (2020 edition) and Role of Genetic Testing for Inherited PCa Risk:Philadelphia PCa Consensus Conference 2017, the number of biomarkers that can be used in early diagnosis of PCa is relatively small, and their speci city and sensitivity are low (4,5). Therefore, it is particularly important to screen novel, highly speci c and sensitive biomarkers for the diagnosis and prognosis of PCa.
Common treatment options for PCa include surgical resection, radiotherapy(6), endocrine therapy, and immunotherapy (7), however the number of adverse side effects and frequent progression to refractory PCa often have a signi cant impact on the health and quality of life of PCa patients(8). Molecular targeted therapy (MTT) refers to the inhibition of tumor growth and development by interfering with speci c molecules that increase proliferation and migration of tumor cells (9). Currently there are only a few molecularly targeted drugs for PCa (10), and since none of them are very speci c, it is essential to identify alternatives as well.
Natural products are certain chemicals obtained from naturally growing organisms (11). Curcumin is extracted from turmeric and is an important active ingredient in traditional Chinese medicine. Curcumin has good antioxidant, anti-pathogenic microorganism, anti-rheumatism and anti-tumor activities, and can also improve the function of the immune, cardiovascular, digestive and nervous systems (12). Paclitaxel is a natural secondary metabolite isolated and puri ed from the bark of the Paci c yew (13). Paclitaxel and its synthetic variants are known to have anti-cancer properties and have been used extensively in the clinic, but their use for treatment of PCa is less frequent and has been described primarily in laboratory studies (14). Ursolic acid is extracted from fruit rind and has sedative and anti-in ammatory, antibacterial, anti-diabetic, anti-ulcer effects and antioxidant properties (15). Recent studies have shown that all the three natural products have signi cant inhibitory effects on cancer, but their mechanisms of action and targets in PCa have not been thoroughly investigated. Therefore, this study aimed to identify targets of these three natural products in PCa in order to provide theoretical support for their use in MTT of PCa.
In our study, genes associated with the diagnosis and prognosis of PCa were screened using bioinformatics software, in the meanwhile key targets of natural products that act on PCa were analyzed as well. A molecular model of natural product docking with PCa targets was constructed based on network pharmacology to con rm the accuracy of the PCa-natural products targets. This study would provide solid support for the development of MTT of PCa.

Raw data selection
Differences in gene expression between PCa and normal tissues were analyzed using the GEO module on the Assistant for Clinical Bioinformatics (Aclbi) website (www.Aclbi.com), by selecting datasets in the Dataset Screening and Differential Gene Analysis sub-modules. Gene expression data for both GSE3325(16) and GSE46602 (17) were based on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array). The GSE3325 dataset contains information from 6 benign prostate tissues and 13 PCa tissues; and the GSE46602 dataset contains information from 14 benign prostate tissues and 36 PCa tissues (Additional le : Table. S1). (The GEO data in Aclbi is updated every 3 months and the latest updated version on May 17, 2021 was utilized.) 2.2 Data processing and screening of differentially expressed genes(DEGs) https://www.proteinatlas.org/ ) ( Version: 20.1) (25). The pan-cancer analysis for core genes was performed in Timer2(http://timer.cistrome.org/) database(26).

PCa treatment targets
Validated PCa targets were identi ed in the GeneCards (27) (https://www.genecards.org/), Online Mendelian Inheritance in Man(28) (OMIM,https://omim.org/) and PharmGkb (29) (https://www.pharmgkb.org/) databases. Targets with relevance scores > 1 in the GeneCards database were considered to be signi cant. The targets retrieved from the three databases were combined with the core differential genes while duplicate values were removed, resulting in the set of therapeutic targets for PCa.

Disease-natural product targets
Intersections of the PCa and curcumin/paclitaxel/ursolic acid targets were identi ed in Excel following Duilio's method (35), and the resulting targets were designated pharmacodynamic targets of curcumin/paclitaxel/ursolic acid in PCa.
2.9 Construction of PPI networks of PCa-natural product targets and screening of core modules Information about PCa-natural products was imported into the String database to construct a network of interactions between targets, and interactions with a con dence score of 0.4 or more were retained. The inter-target interaction scores were also imported into Cytoscape software for core module screening by using the CytoNCA(36) plugin, and the median of the Betweenness Centrality (BC), Closeness Centrality (CLC), Degree Centrality (DC), Eigenvector Centrality (EC), Local Average Connectivity-based method (LAC) and Network Centrality (NC) scores was used as the basis for screening of genes to obtain their core motifs.
2.10 GO and KEGG enrichment analysis of disease-natural product targets DAVID database was used to provide data for the GO and KEGG analysis of disease-natural product targets, when the p-value was set as 0.05 or less. The Hiplot online database was used for enrichment analysis of disease-natural product targets for mapping with a p-value < 0.05 and q-value < 0.05.

Construction of molecular docking models
Protein receptors were identi ed based on the KEGG enrichment results of the disease-natural product targets. The small-molecule ligands were curcumin/paclitaxel/ursolic acid. 2D structures of curcumin/paclitaxel/ursolic acid were downloaded from the PubChem database and converted into 3D structures by ChemO ce software(version 14.0.0.117), and the structure was adjusted to minimize the binding energy, where the minimum RMS gradient = 0.01. Protein receptor 3D structures were obtained from the Protein Data Bank(37) (PDB, https://www1.rcsb.org/). In PyMoL (version 2.4.0) software(38), the water molecules and the original small-molecule ligand were removed from the 3D structure. The 3D protein structure was hydrogenated using AutoDockTools software(version 1.5.6) (39), and a grid box was added to the protein structure with different reference values depending on the protein receptors. Molecular docking was performed in AutodockVina by setting the energy range to 5. Targets with binding energy lower than − 5 kcal/mol are retained, and the lower the binding energy, the a nity between small molecules and targets. The resulting models were displayed using PyMoL software.

Results
In IPA, |logFC| > 1 and a p-value < 0.05 were set as the cutoff value to obtain DEGs from the GEO module.

GO and IPA pathway analysis
After analysis of the information in the DAVID database, GO analysis revealed that biological processes BPs were enriched in muscle system process, cell junction assembly, epithelial cell proliferation, muscle contraction, extracellular matrix organization, regulation of epithelial cell proliferation, extracellular structure organization, regulation of actin lament-based process, muscle cell differentiation, and others. (Fig. 2a). MFs were enriched in receptor ligand activity, signaling receptor activator activity, sulfur compound binding, actin binding, enzyme inhibitor activity, glycosaminoglycan binding, heparin binding, tubulin binding, extracellular matrix structural constituent and others. (Fig. 2b). CCs were enriched in collagen-containing extracellular matrix, cell-cell junction, cell-substrate junction, focal adhesion, membrane raft, membrane microdomain, membrane region, contractile ber, myo bril, microtubule, sarcomere, cell leading edge, apical part of cell (Fig. 2c). IPA pathway analysis indicated the major pathways associated with carcinogenesis of PCa includes: hepatic brosis/hepatic stellate cell activation, ILK signaling, xenobiotic metabolism PXR signaling pathway, xenobiotic metabolism AHR signaling pathway, LPS/IL-1 mediated inhibition of RXR function, xenobiotic metabolism CAR signaling pathway, SPINK1 pancreatic cancer pathway, aryl hydrocarbon receptor signaling, NRF2-mediated oxidative stress response, MSP-RON signaling in cancer cells pathway, osteoarthritis pathway, agranulocyte adhesion and diapedesis, nicotine degradation II, human embryonic stem cell pluripotency, amyotrophic lateral sclerosis signaling, MSP-RON signaling pathway, WNT/β-catenin signaling, endocannabinoid cancer inhibition pathway and others. (Fig. 2d).
3.2 PPI network construction, core genes and modules screening

Validation of core prognosis genes by GEPIA
The expression levels of the DEGs were further validated in the GEPIA database using a cutoff value of |logFC| > 1 and p < 0.01. Then 9 key candidate genes were screened out, which may be closely related to the occurrence and development of PCa, namely UBE2C, TPX2, CENPF, TOP2A, CCNB1, ZWINT, RRM2, BIRC5, and TK1 (Fig. 4a). According to GEPIA's disease free survival analysis, all nine genes were closely related to the poor prognosis of PCa (Fig. 4b). Survival analysis showed that the high-risk values of all validated genes were greater than 1, indicating that they were all high-risk genes for PCa with poor prognosis outcomes.

Validation of key diagnosis genes by HPA
According to the results of GEPIA expression analysis, these 9 key genes were highly expressed in PCa and low in normal cancer tissues. HPA database, which contains plenty of clinical immunohistochemical (IHC) sample data, can be used as a tool to validate the key diagnosis genes. From HPA analysis, CENPF, TPX2, TK1 and CCNB1 were found to be the most extensively studied diagnosis genes from the 9 candidates (Fig. 5a), which were only expressed in prostate tumor tissues but not in normal prostate tissues. RRM2 was not expressed in both normal and tumor tissues. UBE2C, TOP2A and BIRC5 were expressed in both normal tissues and tumor tissues with no signi cant difference. ZWINT has no data in this database. After validation by HPA, CENPF, TPX2, TK1 and CCNB1 were found to be more certain in the carcinogenesis of PCa and can be used as diagnosis biomarkers. The other 5 genes, RRM2, UBE2C, TOP2A, BIRC5 and ZWINT would be potential diagnosis markers for PCa and need further studies to validate. The pan-cancer analysis in Timer2(40) database indicated the expression of all the 9 genes were signi cantly upregulated in most cancers including PCa, which suggest their potency of being the diagnosis markers of PCa (Fig. 5b).

Disease-Drug targets initial identi cation
The GeneCards, OMIM and PharmGkb databases were used to search for PCa targets. 10,055 targets were found in the GeneCards database, 403 in the OMIM database and 411 in the PharmGkb database.
After combining the analysis with the DEGs and removing duplicate targets, a total of 10,366 targets were identi ed for PCa (Fig. 6, Additional le: Table. S2).

PCa-curcumin initial targets
A total of 141 curcumin targets were found in the DrugBank, TTD, BindingDB, PubChem and chEMBL databases after removing duplicate targets (Additional le: Table. S3), and a total of 114 intersections of the initial PCa and curcumin targets were obtained. (Table 2) 3.5.2 PCa-paclitaxel initial targets After searching the above 5 databases, 121 targets of paclitaxel were obtained after removing duplicate targets (Additional le: Table. S3), and 105 targets were obtained after taking the intersections (Table 2) with PCa targets, which constituted the initial PCa-paclitaxel targets.

PCa-paclitaxel secondary targets
A PPI network graph of PCa-paclitaxel targets was constructed from the String database. 100 nodes and 951 edges were obtained after removing unconnected nodes (Fig. 8a). The PPI network was analyzed in Cytoscape software by using the CytoNCA plugin, while the data analyzed by Cytoscape was analyzed again with the R software. The targets were screened according to the median of the BC, CLC, DC, EC, NC and LAC scores, and all targets with scores larger than the median were retained. The core motifs of these were obtained by calculation.  (Fig. 8c, 8d). Finally, the secondary core PCa-paclitaxel targets were identi ed as SRC, ATM, PTEN, CASP3, CDKN2A, NOTCH1, EGFR, ERBB2, TP53, KRAS, AR, AKT1 and BRCA1.

PCa-ursolic acid secondary targets
A PPI network graph of PCa-ursolic acid targets was constructed from the String database. The PPI graph with 38 nodes and 78 edges was obtained after removing unconnected nodes (Fig. 9a) The PPI were analyzed in Cytoscape software using the CytoNCA plugin, and the data analyzed by Cytoscape was analyzed again with R software. The targets were screened according to the median of the BC, CLC, DC, EC, NC and LAC scores, and all genes with scores larger than the median were retained. The core motifs were obtained by calculation. The targets were screened based on BC: 17.015873015; CLC: 0.359223301; DC: 3; EC: 0.054310229; LAC: 1; NC: 1.5833333335 using CytoNCA and R software. The core motifs of 7 nodes and 11 edges were obtained afterward (Fig. 9b, 9c), and the secondary core PCa-ursolic acid targets were identi ed as PTGS2, PLA2G1B, PPARA, RELA, HIF1A, GSK3B and HDAC1.
3.7 Functional enrichment results for disease-drug targets GO and KEGG enrichment for disease-drug targets was performed using the DAVID database and analyzed and mapped on the Hiplot website to explore the most relevant pathways to the PCa carcinogenesis and to nd the PCa pathway highly associated targets.

Functional analysis of PCa-ursolic acid targets
As shown in Fig. 10c, GO analysis revealed that BPs were enriched in histone H3 deacetylation, oxidationreduction processes, protein deacetylation, peptidyl-tyrosine dephosphorylation, negative regulation of myotube differentiation, regulation of transcription from the RNA polymerase II promoter in response to hypoxia, negative regulation of the insulin receptor signaling pathway, response to hypoxia, peptidylproline hydroxylation to 4-hydroxy-L-proline, positive regulation of receptor biosynthetic process, histone deacetylation, and in ammatory responses. MFs were enriched in NAD-dependent histone deacetylase activity (H3-K14 speci c), histone deacetylase binding, protein deacetylase activity, histone deacetylase activity, protein tyrosine phosphatase activity, transcription factor binding, enzyme binding, peptidylproline 4-dioxygenase activity, NF-kappaB binding, repression of transcription factor binding, and protein kinase binding. CCs were enriched in the histone deacetylase complex, nucleus, cytosol, nucleoplasm, protein complexes, cytoplasm, transcriptional repressor complexes, Golgi apparatus, and extracellular exosomes. In terms of KEGG pathway enrichment, the targets were focused on viral carcinogenesis, insulin resistance, HIF-1 signaling pathway, pathways in cancer, arachidonic acid metabolism, renal cell carcinoma, insulin signaling pathway, galactose metabolism, starch and sucrose metabolism, alcoholism, thyroid hormone signaling pathway, regulation of lipolysis in adipocytes, and steroid hormone biosynthesis. Based on the enrichment results of the KEGG pathway, 3 targets were enriched in the PCa pathway: GSK3B, RD5A2 and RELA.

Construction of molecular docking models
The molecular docking was utilized to determine the tightness of binding between the target proteins and the natural products, which will be used to con rm the nal PCa-natural products' targets.

Curcumin-receptor protein molecular docking models
Four nal PCa-curcumin targets were identi ed from intersections with the secondary core targets and 7 targets from KEGG pathway enrichment results, namely EP300, RELA, EGFR, and NFKB1. The four targets were used as protein receptors in the molecular docking models with curcumin as the small-molecule ligand to con rm the nal PCa-curcumin targets. When EP300 was used as the protein receptor, the spacing of the gird box in the AutodockTools was set to 1, the number of points in the x-, y-, and zdimensions was set to 40, and the offset values of the Center Grid Box were: x center: −40.097; y center: 97.454; and z center: 194.199. Molecular docking was performed using Vina, which resulted in a model located at the center of the active site with the lowest binding energy, where the binding energy was − 6.9 kcal/mol. PyMoL was used to obtain the model (Fig. 11a). The construction of molecular docking models for the other three genes was completed by using the same method. For RELA, the spacing of the gird box in the AutodockTools was set to 1, the number of points in the x-, y-, and z-dimensions was set to 40, and the offset values of the Center Grid Box were: x center: 62.502; y center: 11.629; and z center: 37.997. The binding energy of the model was − 6.9 kcal/mol. For EGFR, the spacing of the gird box in AutodockTools was set to 1, the number of points in the x-, y-, and z-dimensions was set to 40 (Fig. 11b, 11c, 11d). The binding energies of the four targets to curcumin were lower than − 5 kcal/mol, indicating that the binding was stable and the a nity was strong, so all the four targets were con rmed as the nal PCa-curcumin targets . 3.298; the binding energy was − 6.9 kcal/mol ( Fig. 12b − g). The binding energies of the seven targets to paclitaxel were lower than − 5 kcal/mol, indicating that the binding was stable and the a nity was strong, so all the seven targets were con rmed as the nal PCa-Paclitaxel targets.

Ursolic acid-receptor protein molecular docking models
Two nal PCa-ursolic targets were found from the intersections with the secondary core targets and 3 targets from KEGG pathway enrichment results, namely GSK3B and RELA, which were also used as the protein receptors in the molecular docking models when ursolic acid was used as the small-molecule ligand. When GSK3B was used as the protein receptor, the spacing of the gird box in AutodockTools was set to 1, the number of points in the x-, y-and z-dimensions was set to 40, and the offset values of the Center Grid Box were: x center: 0.296; y center: 3.935; and z center: −4.831. The model with the lowest binding energy was obtained using Vina for molecular docking, where the binding energy was − 7.4 kcal/mol. When RELA was used as the protein receptor, the spacing of the gird box in AutodockTools was set to 1, the number of points in the x-, y-and z-dimensions was set to 40, and the offset values of the Center Grid Box were: x center: 62.502; y center: 11.629; and z center: 37.997. The model with the lowest binding energy was obtained after analysis using Vina for molecular docking, where the binding energy was − 8.1 kcal/mol. PyMoL was used to obtain the model (Fig. 13a, 13b). The binding energies of the two targets to ursolic acid were lower than − 5 kcal/mol, indicating that the binding was stable and the a nity was strong, so both targets were con rmed as the nal PCa-Ursolic acid targets.

PCa biomarkers analysis
According to HPA results, the 9 PCa associated genes can be classi ed into 2 groups, the well-studied biomarkers and the relatively less-studied biomarkers. CENPF, TPX2, TK1 and CCNB1 were only expressed in prostate tumor tissues but not in normal prostate tissues by the clinical IHC results (Fig. 5a), while there is no signi cant difference for the expression of RRM2, UBE2C, TOP2A, BIRC5 and ZWINT in PCa and normal prostate tissues. However, the lack of signi cant difference between PCa and normal samples may probably due to the relatively limited samples stored in HPA. In addition, from the pancancer analysis, it demonstrates that the expression of all the 9 genes are signi cantly upregulated in most cancers including PCa, which suggest the value of studying the 9 genes as potential diagnosis markers of PCa (Fig. 5b).

The well-studied PCa biomarkers
The identi ed well-studied PCa diagnosis markers from our bioinformatics study indicate the rationality of the study design.
Mitotic-speci c cyclin-B1(CCNB1) played an important stabilization role of mitotic process in the G2 to M phase of the cell cycle (41). Through IHC experiments, CCNB1 was found to be overexpressed in PCa specimens, and acted as a prognostic marker of PCa chemotherapy (42). It is found that prostate cancer cells with overexpressed CCNB1 were more sensitive to chemically induced apoptosis (43).
Centromeric protein F (CENPF) is the basic element of the kinetochore complex and plays an important role in the chromosome separation mechanism of mitosis (44). CENPF was con rmed to be a prognostic marker in PCa and its expression was found up-regulated in more severe PCa patients with higher Gleason score, later pathological stage, and lymph node metastasis (45).
Thymidine kinase-1 (TK-1) has been extensively studied as a diagnostic biomarker for various tumors including PCa(46). It was found that the serum level of TK1 was signi cantly higher in PCa patients than in normal subjects, and the expression levels were signi cantly different between PCa patients in different stages (47). Increased TK1 in cancer patients after surgery or chemotherapy treatment are associated with a worse prognosis(48).
Targeting protein for Xenopus kinesin-like protein 2 (TPX2) is a microtubule-associated protein that plays an important role in the mechanism of chromosome segregation in mitosis (49). It is a good biomarker for the diagnosis and prognosis of PCa. PanHW et al. found in the IHC samples of human normal prostate tissues, PCa tissues and PCa cells, that TPX2 was not expressed in normal tissues but highly expressed in PCa, which was consistent with our ndings. (50). TPX2 can be used as prognosis biomarkers as well.
It shows that the time to biochemical recurrence-free was shorter in the TPX2 high expression group, which indicating a poor prognosis(51).

The relatively less-studied PCa biomarkers
Though lack of validation by HPA database, the identi ed relatively less-studied PCa biomarkers from our bioinformatics study would provide novel and valuable insights in the diagnosis and prognosis of PCa.
Baculoviral IAP repeat-containing protein 5(BIRC5) is multitasking protein with dual roles in promoting cell proliferation and preventing apoptosis (52). Its role in the carcinogenesis and prognosis of PCa is not quite clear. According to Adisetiyo et al.'s nding, the expression of BIRC5 is directly proportional to the tumor volume in a mouse model of PCa (53). According to a latest large clinical study, BIRC5 was found increased in PCa especially advanced cases but still not clear whether it is an independent PCa biomarker(54). These results indicate consistence with our ndings, that BIRC5 is highly expressed in PCa and is a high-risk gene for the prognosis of PCa.
Ribonucleotide reductase regulatory subunit M2 (RRM2) is an enzyme that regulate DNA synthesis and repair, which is important in carcinogenesis (55). RRM2 was highly expressed in patients with poorly differentiated PCa or advanced PCa (p < 0.05), and RRM2 would be a prognostic marker for the risk of recurrence in patients with low-risk PCa(56). In patients with Gleason score 4-7 and no invasion of the tumor into the prostate capsule, Cox proportional hazards analysis revealed that the risk of recurrence was positively correlated with RRM2 protein expression levels(56). In a recent preclinical and clinical study, RRM2 was found to be a driven factor for poor PCa prognosis outcomes and knockdown RRM2 could prohibit the PCa development (57). Although RRM2 was not found in either PCa or normal prostate IHC samples in HPA database, RRM2 is high likely to be a vital diagnosis and prognosis biomarker of PCa based on our analysis and recent ndings.
DNA topoisomerase 2-alpha(TOP2A) encodes topoisomerase IIa, which controls DNA topology as well as cell cycle progression(58). According to Labbé et al., TOP2A can be a marker for PCa metastasis, and they found that TOP2A levels were highly expressed in patients with metastatic PCa by cohort studies, in IHC staining of tissue sections from patients with different stages of PCa, and in mice PCa cells (59).
According to Resende et.al., TOP2A can exist as a prognostic marker in PCa (60). In PCa patients with high Gleason scores (i.e., poorly differentiated PCa), the levels of TOP2A were signi cantly higher. And biochemical recurrence-free survival was signi cantly shorter in patients with high TOP2A expression (P = 0.001)(60). Our results consistently found TOP2A was highly expressed in PCa patients, and was a high-risk gene for PCa prognosis.
Ubiquitin ZW10 interactor(ZWINT) refers to a protein involved in kinetochore function(64). There are less clinical proves to illustrate the mechanistic link between ZWINT and PCa. In a bioinformatics analysis, ZWINT was found to be increased in PCa and was negatively correlated with miR-1(65). In a study of PCa microarray data, ZWINT was also found to be upregulated in PCa and correlated with the PCa grade(66).
In our study, ZWINT is highly expressed in PCa and is positively correlated with poor prognosis, which can be used as a potential biomarker for prostate cancer diagnosis and prognosis. However more follow-up experiments are needed to verify and discover its role in PCa.

Targets of natural products for treatment of PCa
Our study identi ed four potential curcumin targets (EP300, EGFR, RELA, NFKB1), 7 paclitaxel targets (PTEN, EGFR, ERBB2, TP53, KRAS, AR, AKT1), and 2 ursolic acid targets (GSK3B, RELA) that could be used for PCa treatment. Peer research results suggest they may be potential PCa-natural products targets.

Targets of curcumin for treatment of PCa
E1A-associated protein P300(EP300) plays an important role in cell proliferation, cell cycle regulation, apoptosis, and DNA damage repair(67). It is highly expressed in advanced PCa such as CRPC and is accepted as targets for PCa treatment(68, 69). Though there is no direct evidence, EP300 could probably be a potential PCa target for curcumin's inhibition. It is found that Curcumin could prohibit the acetyltransferase function of EP300 in Human osteosarcoma cells (70).By a silico docking analysis, curcumin analogue was found to molecularly target EP300 and inhibit its acetyltransferase ability (71).
Epidermal growth factor receptor (EGFR) is actively involved in the carcinogenesis processes such as tumor cell proliferation, angiogenesis, tumor invasion, metastasis and apoptosis (72). Curcumin may exert its anti-PCa effects through targeting EGFR. Co-delivery of curcumin and docetaxel by EGFR ligand nanoparticles have shown therapeutic effects on PCa (73). Synergetically administration of curcumin and phenylethyl isothiocyanate were found to inhibit the growth of PCa PC-3 cells most probably through targeting EGFR, Akt and NF-kappaB pathways (74).
Transcription factor p65(RELA) is an isoform of the nuclear factor kappa-B protein, also known as p65, which can regulate in ammation, cell differentiation, proliferation and apoptosis (75). Only one reference was found indicating curcumin could inhibit PCa through RELA. In analyzing human PCa cell lines (PC3, DU145, LNCap), it was found that curcumin could inhibit the growth of androgen-independent PCa cells through p65 inhibition (76). Our network pharmacology analysis highly suggests the value of studying RELA as a novel potential PCa-curcumin target.
Nuclear factor NF-kappa-B p105 subunit(NFKB1) is another isoform of the kappa-B protein, also called p105/p50, which plays an important role in modulating innate and adaptive immunity, cell proliferation and apoptosis (77). Activation and translocation of NFKB1 is correlated with the progression of PCa(78).
In human PCa cells, curcumin was found to inhibit the growth of LNCaP and PC3 cells by blocking the transactivation of NFKB1(78, 79).

Targets of paclitaxel for treatment of PCa
Phosphatase and tensin homolog deleted on chromosome ten(PTEN) plays an important role in carcinogenesis and tumor metastasis suppression(80). Combination application of Naringin and paclitaxel were found to inhibit growth of DU145 PCa cells by increasing PTEN and decreasing NFKB1(81). Paclitaxel could induce more apoptosis in PTEN-positive 22Rv1 PCa cells than these cells with PTEN knockdown (82). These results probably demonstrate that PTEN can be used as a target for paclitaxel treatment of PCa.
EGFR is a common potential target for both curcumin and paclitaxel in the treatment of PCa.

Targets of ursolic acid for treatment of PCa
Glycogen synthase kinase 3β(GSK3B) is involved in biological processes such as energy metabolism, in ammation, and apoptosis(98). Only 1 report so far indicates the correlation between urosolic acid and PCa and GSK3B. Ursolic acid was found to induce apoptosis to PC-3, LNCaP and DU145 PCa cells with enhanced phosphorylation GSK3B and the apoptosis effects can be reversed by GSK3B inhibitor SB216763 (99). In HepG2 liver cancer cells, ursolic acid can also induce apoptosis by targeting and regulating the phosphorylation of GSK3B(100). Hence GSK3B is a probably a novel potential target for ursolic acid in the treatment of prostate cancer.
Transcription factor p65(RELA) is a common potential PCa therapeutic target of curcumin and ursolic acid, which deserves further studies in the mechanism of action. The direct proof of ursolic acid inhibit PCa through RELA is limited. Shanmugam et al. found ursolic acid could induce apoptosis in DU145, LNCaP PCa cells in a dose-dependent manner through inhibiting the activity of NF-kB and phosphorylation of p65(RELA)(101).

Conclusions
In this study, by analyzing two PCa datasets from the GEO database and validated by GEPIA and HPA, 9 genes were identi ed to be closely associated with the development and prognosis of PCa. All the nine markers indicated poor prognosis for PCa patients veri ed by GEPIA. The 4 identi ed biomarkers veri ed by HPA (CENPF, TPX2, TK1 and CCNB1) indicate the rationality of our study design and the identi ed 5 relatively less-studied PCa biomarkers (RRM2, UBE2C, TOP2A, BIRC5 and ZWINT) would provide novel and valuable insights in the in the diagnosis and prognosis of PCa. The carcinogenesis of PCa is found to be mainly associated with hepatic brosis pathway, ILK signaling, NRF2-mediated oxidative stress response and many others. By network pharmacology, after the integration of data on small molecule natural products acting on PCa, 4 targets of curcumin (EP300, RELA, EGFR, NFKB1), 7 targets of paclitaxel (PTEN, EGFR, ERBB2, TP53, KRAS, AR, AKT1), and 2 targets of ursolic acid (GSK3B, RELA) were identi ed.
Our study reveals potential key biomarkers for the diagnosis and prognosis of PCa and demonstrates three valuable natural products for molecular targeting of PCa, which laid a promising foundation for further study of novel biomarkers and molecularly targeted drugs for PCa. followed the principles of the Declaration of Helsinki. All datasets analyzed were obtained from publicly available databases, hence written informed consent was not applicable.

Consent for publication
Not applicable Availability of data and materials All data involved in the study are included in this manuscript and its supplementary les.

Competing Interests
The authors declare that there are no con icts of interest.
Author Contribution WL designed the project, analyzed the data, drafted and revised the manuscript. WX did most of the bioinformatics analysis and modi ed the manuscript. KS, FW and TWW performed the literature search and revised the manuscript.  Heatmap for both the GSE3325 and GSE46602 datasets. Each column represents one dataset and each row represents one gene. Color changes from blue to red represents a transition from downregulation to upregulation of expression ( rst 100 targets displayed).