The GEO database (http://www.ncbi.nlm.nih.gov/geo/) at the National Center for Biotechnology Information (NCBI) is a communal database that provides a genomics data repository of gene expression, chip, and microarray data. The criteria for GSE data included in the study as follow: 1. The GSE samples have complete gene expression data from high-throughput sequencing and can be downloaded from GEO database. 2. The GSE samples data included both BPH samples and PCa samples. 3.There is a clear definition of BPH and PCa samples. Then we found three datasets GSE5377, GSE104749 and GSE30994 met our criteria. Then we downloaded the three datasets from GEO database. GSE5377 included 3 BPH samples and 17 PCa samples. GSE104749 included 4 BPH samples and 4 PCa samples. GSE30994 included 3 BPH samples and 3 PCa samples. Overall, 10 BPH and 24 PCa samples were enrolled in our study.
Data handling and DEGs searching
The primary data were got and normalized by R software. According to comments of the documents, the expression matrix including probe ID was substituted by the corresponding gene ID, and if there were multiple probes that corresponded to the same gene, the average value was calculated using the R software for further study. Then all genes of each data set were searched using the limma R package, and genes with an adjusted P-value<0.05 and |log2fold change (FC)|>1 were considered DEGs. Then, we used the online web tool, Venn diagrams (http://bioinformatics.psb.ugent.be/webtools/Venn/) to find the integrated DEGs. In addition, the up-regulated and down-regulated genes were downloaded for further study.
GO and KEGG pathway analysis of DEGs
The Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 (https://david.ncifcrf.gov/) was used to perform GO functional and KEGG pathway analyses of the integrated DEGs. The GO functional analysis of integrated DEGs involved three parts: biological processes (BP), cell components (CC), and molecular functions (MF). P<0.05 was considered statistically statistical differences .
PPI network and module analysis
A PPI network of the integrated DEGs was structured by online tool the Search Tool for the Retrieval of Interacting Genes (STRING) database with the default medium confidence (0.4) (http://www.string-db.org/). It helped us to find the key genes and critical gene modules participated in the promotion of BPH to PCa. Cytoscape software was used for reconstructing the PPI network, and module and GO analyses were carried out by two plug-ins in Cytoscape, Molecular Complex Detection (MCODE) and Biological Network Gene Ontology tool (BiNGO), to clarify the biological significance of gene modules from BPH to PCa. P<0.05 indicated a significant difference, and these genes were designated as hub genes.
Construction of risk prediction model and survival analysis
Hub gene expression between normal prostate specimens and PCa tissues was compared using gene expression profiling interactive analysis (GEPIA; http://www.gepia.cancer-pku.cn/) dependent on TCGA database. Logistic regression was performed to screen the hazard ratios of hub genes changes leading to PCa. A nomogram was built to predict the risk value of the hub genes. A forest map was utilized to show the hazard ratios more intuitively. Moreover, the prognostic value of gene was enucleated by GEPIA. Then, overall survival (OS), disease-free survival (DFS) was analyzed too.
Construction of the diagnostic model and decision curve analysis
To further analyze the hub genes’ diagnostic value for PCa, we collected the gene expression of hub genes and clinical data from TCGA databases (https://portal.gdc.com). GraphPad Prism 7 (GraphPad Software, Inc., San Diego, CA) was used to draw the receiver operating characteristic (ROC) curve and decision curve analysis (DCA) was carried out by R software.
Expression of hub genes at different tumor stages
Tumor-Node-Metastasis (TNM) classification of malignant tumors is commonly used to assess the tumor severity. Hub gene expression in different TNM stage of PCa was analyzed on TCGA data.
Validation of hub gene expression based on Chinese PCa patients and different databases
We downloaded the RNA-sequence data of Chinese PCa patients from Chinese Prostate Cancer Genome and Epigenome Atlas (CPGEA; http://www.cpgea.com/) and tested hub genes expression in Chinese PCa patients. We further analyzed the hub genes’ expression in normal prostate samples and PCa specimens on the UALCAN database (http://ualcan.path.uab.edu/) and The Human Protein Atlas (https://www.proteinatlas.org/)[20, 21].
Clinical specimen collection
The PCa patients participated in the study was confirmed had BPH before. The methods used for collecting the samples were approved by the Ethics Committee of Tongji Hospital, School of Medicine, Tongji University (SBKT-2021-220). Patients who provided the samples were familiar with the process of the experiment and gave informed consent.
Cell culture and transfection
C4-2 PCa cells were purchased from the Chinese Academy of Science Cell Bank (Shanghai, China). C4-2 cells were cultured in in Roswell Park Memorial Institute (RPMI) 1640 medium (Sigma, Darmstadt, Germany，Catalog No. R8758) with 10% fetal bovine serum (FBS) (Gibco, Thermo Fisher Scientific, Waltham, MA, USA, Catalog No. 10091). C4-2 cells were transfected with MYC breakdown (shMYC) plasmids, MYL9 overexpression (oeMYL9) plasmids, and SNAI2 overexpression (oeSNAI2) plasmids (constructed by fenghbio company Hunan, China) by Lipofectamine 2000 (Thermo Fisher Scientific, Catalog No. 11668019) according to the manufacturers’ instructions. The shMYC plasmid sequence is: CCTGAGACAGATCAGCAACAA.
Rabbit monoclonal antibodies against c-MYC (Catalog No. ab32072) and MYL-9 (Catalog No. ab191312) was purchased from abcam company (Cambridge, UK). Mouse monoclonal antibodies against SNAI2 (Catalog No. ab51772) and anti-GAPDH (Catalog No. ab8245) was purchased from abcam company, too. HRP AffiniPure Goat Anti-Rabbit IgG (Catalog No. A0216) and HRP AffiniPure Goat Anti-Mouse IgG (Catalog No. A0208) were purchased from Beyotime Biotechnology Company (Shanghai, China).
RNA extraction and qRT-PCR
The total RNA was extracted from tumor and para-cancerous tissues of patients and cells utilizing TRIzol Reagent (Sigma–Aldrich, St. Louis, MO, USA, Catalog No. T9424). cDNA was transcribed using the reverse transcription kit (Advantage® RT-for-PCR Kit, Takara Bio Inc., Kusatsu, Japan, Catalog No. 639505). Finally, we measured the volume of cDNA using real-time PCR reagents and a kit (TB Green® Premix Ex Taq™ II, Takara Bio Inc., Catalog No. RR420A) according to the manufacturer’s descriptions. The following primers of c-MYC, MYL9, SNAI2 and GAPDH were shown in Table 1 (Table 1). The 2−ΔΔCt method was used to quantify mRNA expression levels.
Protein was extracted with RIPA lysis buffer from tissues and cells. Protein samples were treated with Dual Color Protein Loading Buffer (Thermo Fisher Scientific, Waltham, MA, USA, Catalog No. NP0007). SDS–PAGE gels (10% and 15%) were used to separate proteins, followed by transfer to nitrocellulose membranes (Merck KGaA, Darmstadt, Germany, Catalog No. 71078)). Protein-Free Rapid Blocking Buffer (Thermo Fisher Scientific, Catalog No. 37584) was utilized to block the membranes. Then the membranes were incubated overnight at 4°C with primary antibodies against c-MYC (1:1000), MYL9 (1:1000), SNAI2 (1:1000) and GAPDH (1:1000). The next day, 1xTBST was used to wash the membranes three times (10 min. each). Then, the membranes were incubated at room temperature for 1 h with a matched secondary antibody (1:1000). Lastly, the membranes were exposed to X-ray film (FluorChem R, Protein Sample, California, USA).
The expression of MYC, MYL9, and SNAI2 in clinical patients’ specimens was detected by IHC. Tumor samples were fixed by formalin and embedded into paraffin. Four-micrometer thick sections were cut from the samples and fixed. Sections were antigen retrieved and immunostaining was performed as described. Anti-MYC antibody (1:1000), anti-MYL9 antibody (1:400) and anti-SNAI2 antibody (1:500). Two experienced pathologists (unaware of tissue information) independently evaluated and scored the intensity of IHC.
Cell invasion assay
After 48 h of transfection, approximately 1*105 C4-2 cells and 150 uL 2% fetal bovine serum FBS +1640 culture medium was put in the upper chamber, and 10% FBS+1640 culture medium was placed in the lower cubicle. After 48h, cells were fixed with 4% paraformaldehyde fixative solution. The cells were stained with crystal violet and observed by an Olympus microscope (Olympus Corp. Tokyo, Japan). ImageJ was utilized to count cell numbers.
Cell proliferation assay
After 48 h of transfection, about 1000 C4-2 cells were placed in each well of a 96-well plate. Each set was repeated three times. The proliferation of cells in 0, 24, 48, and 72 h were detected by Cell Counting Kit-8 (CCK-8) (Solarbio, Beijing, China, Catalog No. CA1210). The optical density (OD) at 450nm was measured by enzyme labeling (LD942, Beijing, China).
The matrix data was handled with R version 4.0.2 (Institute for Statistics and Mathematics, Vienna, Austria; https://www.r-project.org). For descriptive statistics, mean±standard deviation was used for continuous variables with normal distributions, whereas the median (range) was used for continuous variables with abnormal distributions. Categorical variables were described by counts and percentages. Hazard ratios (HRs), the 95% confidence interval (95% CI), and P values were used as statistical metrics. Two-tailed P<0.05 was deemed as statistically significant.