Emerging biomarker associated with prognosis of prostate cancer identied via co-expression analysis

Background: Forkhead box protein f1 (Foxf1) is associated with oncogenesis, and maybe play a key role in prostate cancer. However, the effect of Foxf1 on the development of prostate cancer and its clinical implications were still unknown. Method: The transcriptome data and clinical information were downloaded from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). The differential co-expression genes which were the most closely correlated with prostate cancer were identied by weighted gene co-expression network analysis (WGCNA) and differential expression analysis method. The univariate cox regressions were performed to identify the prognostic genes. Finally, Foxf1 were veried in clinical tumor samples and matched normal tissues. Results: A total of 108 differential co-expression genes were identied. Foxf1 was identied that signicantly associated with overall survival rate and biochemical recurrence of prostate. Then, protein-protein interaction (PPI) networks were constructed, including 21 nodes and 26 edges. The data revealed GAPVD1, MIR522, MIR937, CPS1, HOXB9, FUBP3, MIR936, SDF4, SATB2, HOXB13, MIR375, PUSL1, SATB1, MIR382, MIR429, HOXD13, PRSS50, FLNC, STRBP, KHSRP were signicant associated with the regulation and function of differentially expressed Foxf1 in prostate. The experimental results conrmed the Foxf1 were downregulated in the prostate cancer tissues compared with the matched normal tissues. Conclusion: we obtained a gene Foxf1 which is signicantly related with the prognosis of prostate cancer, Foxf1 have signicant values in predicting the patients’ Overall survival and may serve as a potential prognosis biomarker and a new therapeutic target of prostate cancer.


Introduction
Prostate Cancer is the most common malignant tumor in the male genitourinary system [1]. In the United States, the incidence rate is highest, while the disease mortality rate accounts for 9% of all malignant tumor deaths in men, second only to lung cancer [2]. According to the latest epidemiological statistics, in the United States, nearly 164,690 new cases of prostate cancer were diagnosed in 2018, and nearly 29,430 people were died because of prostate [3]. Prostate cancer lack of speci c clinical symptom as the tumor grows quietly, and the failure to detect tumor early cause the patients lost the opportunity for curative surgery [4]. Today, prostate cancer was diagnosed by detecting the prostate speci c antigen (PSA) and digital rectal examination and eventually con rmed by needle biopsies [5]. However, the PSA screening has low speci city although it has high sensitivity. This results in a negative needle biopsies and cause overtreatment and unnecessary injury [6,7]. Thus, there is an urgent need for the development of e cient biomarkers.
With the development of transcriptome, bioinformatics has become more popular for identify molecular mechanisms of diseases and discover potential biomarkers [8,9]. Weighted gene co-expression network analysis (WGCNA) can identify gene potential function and extract gene co-expression modules according to the clinical pro le [10]. This algorism calculates the correlation coe cient value of expression level and then takes an exponentiation, which makes the genes network to meet the scale-free distribution [11]. Brie y, we could obtain interested gene modules which have high correlation with clinical pro le. Furthermore, differential gene expression analysis provides methods for discovering signi cant differentiation in gene expression level and understanding the molecular mechanisms underlying genome regulation between cancer tissues and adjacent normal tissues [12]. Thus, the highly related genes that are found by using WGCNA and differential gene expression analysis are useful to serve as e cient biomarkers.
In this study, the gene expression data pro les of TCGA-PRAD and GSE70768 were analyzed through WGCNA and differential gene expression analysis to get hub genes. Then, we performed functional enrichment analysis and protein-protein interaction (PPI) analysis to explore the underling mechanism.

Method
The work ow of date preparation, analysis, and validation is shown in gure 1.

Data preparation
The level 3 RNA sequencing data and clinical information of prostate cancer were downloaded from the TCGA(https://portal.gdc.cancer.gov/) up to November 9, 2020. There were 552 samples, which including 500 prostate cancers and 52 normal tissues. As suggested by the "limma" R package [13], the gene expression pro les of Pca were normalized. RNA sequence data of another 199 samples were obtained from GEO(https://www.ncbi.nlm.nih.gov/geo/) using R package GEOquery [14], which including 126 prostate cancers and 73 corresponding normal tissues.
Weighted correlation network analysis (WGCNA) was utilized to select candidate biomarkers and therapeutic targets [15,16]. The gene expression data pro les of TCGA-PRAD and GSE70768 were formed which each raw displays a different gene and each column displays a sample. The clinical feature table contained tumor and normal sample.
First, outlier samples by hierarchical clustering were delete, then build a scale-free network, the best soft threshold was obtained according to the value of mean connectivity and scale-independent. The adjacency matrix was created by calculating the correlation value of every two genes. To further select functional modules in the co-expression network, the association value between gene modules and the clinical modules was calculated. The gene module with high correlation with clinical pro les was identi ed.
Differential Expression Analysis and Interaction with the Modules of Interest "limma" R package which provides a method for differential expression analyses on RNA-sequence data, was used to nd the differentially expressed genes (DEGs) between PRAD and corresponding normal tissues in TCGA and GSE70768. The DEGs of the TCGA-PRAD and GSE70768 were displayed as a volcano plot by utilize the "ggplot2" R package [17]. Then, the overlapping genes of DEGs and interesting modules were identi ed as potential prognostic genes, which visualized as a venn diagram using the online software(http://sangerbox.com/Tool).

Functional Annotation for Genes of Interest
To conduct Gene Ontology(GO) and Kyoto Encyclopedia of Genes and Genomes(KEGG) of selected genes, the "clusterPro ler" R package was utilized to explore the functions among the overlapping genes, with p value 0.05 [18]. GO annotation contains the three sub-ontologies-biological process (BP), cellular component (CC), and molecular function (MF), which can identify the biological properties of interested genes for all organisms [19].

Screening of Hub Gene and Construction of PPI
In our study, the "survival" R package was utilized to determine survival curves, the clinical data of these Pca patients were came from TCGA-PRAD. Kaplan-Meier curve was used to evaluate univariate survival analysis, with P-value 0.05. The genes that have a signi cant correlation with prognosis were considered as hub genes. Then, the cytoscape (v3.8.1) was utilized to construct the PPI network of overlapping genes [20]. In a co-expression network, the relationship's degrees were calculated by the algorithm of CytoHubba, which is a plugin in Cytoscape [21].

Veri cation of the Expression Patterns and the Prognostic Values of Hub Genes
To a rm the reliability of the hub gene in our study, we veri ed the expression level of the target gene in prostate cancer and adjacent tissues, also con rm the gene expression in different nodal metastasis status by using the online tool Ualcan (http://ualcan.path.uab.edu/cgi-bin/ualcan-res.pl) [22] and the Oncomine (https://www.oncomine.org/resource/login.html) . based on the data from the TCGA and CamcAPP(https://bioinformatics.cruk.cam.ac.uk/apps/camcAPP/) [23], the relationship between overall survival (OS) and the hub gene were con rmed by using the survival package in R tool.

Immunohistochemistry (IHC) staining
Brie y, prostate cancer tissues and adjacent normal tissues were stained via the following steps: the tissues were xed by formalin, dehydrated and para n embedded. Then immunostaining of Foxf1 was carried out with rabbit polyclonal anti-Foxf1 antibody (1:100, SAB, #36858). Then two experienced pathologists assessed the staining independently.

Construction of weighted Gene Co-expression modules
In order to identify the function modules in PRAD patients. Weighted gene co-expression analysis (WGCNA) was constructed from the TCGA-PRAD and GSE70768 with the WGCNA R package. The 11 modules were generated in TCGA-PRAD and 11 modules were generated in GSE70768 where the grey module was not assigned into any cluster. Then heatmap was plotted to show the relationship between different gene modules and the two clinical pro les. The results of the module-clinical pro le relationship were displayed in gure 2 and gure 3 which can be clearly revealed that the cyan gene module had closeness with clinical pro le in TCGA-PRAD, and the thistle2 gene module had the strongest relationship with clinal pro le in GSE70768. The correlation coe cients between the cyan gene module and clinical pro le were 0.48, and the correlation coe cients between thistle2 gene module and clinical pro le was 0.64. then, the results of the Pearson correlation coe cients between cyan/thistle2 gene module and clinical pro le were 0.57/0.8(p 0.01). It is clearly certi ed that the cyan/thistle2 gene module had strongly relationship with the clinical pro le.
Identi cation of hub gene A total of 6088 genes were differentially expressed between tumor tissues and corresponding normal tissues in the TCGA database ( Figure 4A), and 563 genes were differentially expressed in GSE70768 database ( Figure 4B). 1230 and 2513 co-expression genes were identi ed in the cyan and thistle2 modules respectively. The interactive network among this gene including 108 genes, visualized by venn diagram ( gure 4C), which indicated that these genes were the hub genes. Then we further explored the relationship between this genes and prostate cancer overall survival (OS). The result revealed that Foxf1 was the target gene in prostate cancer. subsequently, we utilized the CamcAPP to verify the relationship between the expression of Foxf1 and the biochemical recurrence of prostate cancer, which had a meaningful relationship.

Low expression and hypermethylation of Foxf1 in prostate cancer
We explore the expression of Foxf1 in prostate cancer and adjacent tissues by utilizing Ualcan, visualized as a box plot graph (Fig 6D). we further explored the expression level of Foxf1 in prostate cancer based on nodal metastasis status. Foxf1 was lower in cancer tissues than adjacent tissues, and lower in nodal metastasis tissues (Fig 6E). In addition, the analysis from Ualcan demonstrated that Foxf1 were signi cantly higher methylation in cancer tissues compared with normal tissues (P < 0.05, Fig 6F).
Furthermore, the protein expressional levels of the Foxf1 gene were signi cantly lower in prostate cancer tissues compared with the corresponding normal tissues based on the immunohistochemistry staining (Fig 5).
GO and KEGG pathway analysis of the 108 Genes GO and KEGG pathway analysis were performed to obtain further insight into the potential function of the identi ed 108 genes. The analysis results were showed in gure 7. Among the most enriched GO terms were cell-substrate adhesion and focal adhesion. KEGG pathway analysis shows that hub genes are highly correlate with proteoglycans in cancer.

Discussion
Prostate cancer is the most common malignant disease in men and it was diagnosed in 12% and accounted for 9% of male cancer deaths [24]. Nowadays, many studies endeavored to identify the hub genes which play an important role on the development of prostate cancer from the transcriptome level [25]. In this study, a total of 108 signi cant genes were elected by WGCNA method, which was utilized to analysis the relationship between prostate cancer transcriptome and clinical pro le [26]. As displayed in functional annotation analysis by the R package "clusterPro ler", these genes were mainly in cell-substrate adhesion and focal adhesion, which may cause local metastasis of the tumor. Then, according to the univariable cox analysis, the Foxf1 gene was carried out. Furthermore, the protein-protein interactions (PPIs) network was screened out according to MCC scores from the cytoHubba plugin in Cytoscape.
Foxf1 is a member of the forkhead box family of transcription factor and has been previous reported to be critical for multiple human cancer, such as colorectal cancer [27], gastrointestinal stromal tumor [28] breast cancer [29,30] and non-small cell lung cancer [31]. we found that the downregulation of Foxf1 was signi cantly correlated with a poorer prognosis for patients with prostate cancer. Then we further evaluated the correlation between the Foxf1 expression and the pathological characteristics, we found that Foxf1 downregulation was correlated with positive nodal invasion. But the mechanism of Foxf1 dysregulation is still unclear. Previous study reported that the expression level of Foxf1 was adjusted by multiple genomic and epigenetic mechanisms. In this study, we found Foxf1 were signi cantly higher methylation in prostate cancer tissues compared with corresponding normal tissues, indicating that hypermethylation of Foxf1 may causes the Foxf1 gene to be lowly expressed in prostate cancer. These nding indicated that Foxf1 might be a potential prognostic biomarker in prostate cancer.

Conclusions
In conclusion, we obtained a gene Foxf1 which is signi cantly related with the prognosis of prostate cancer, Foxf1 have signi cant values in predicting the patients' Overall survival and may serve as a potential prognosis biomarker and a new therapeutic target of prostate cancer. Further studies are necessary to con rm these results in our study. All transcriptome and clinical data were available in TCGA database. All the experimental data analyzed and displayed in the present manuscript are generated for this study are included in the article material.

Ethics statement
This article had been reviewed and approved by the Ethics Committee of Huashan Hospital of Fudan university.
Author contribution FY, YL, ZC contributed equally to this work. All authors contributed towards experiments and data analysis, and agreed to take responsibility for all aspects of the job.

Con ict of interest
The authors declare no con icts of interest.

Funding
This study was supported by the National Natural Science Foundation of China (Grant Numbers: 81872102).