Sample characteristics and sequencing data summary of diseased prostate samples: To understand the changes of microbial compositions associated with prostate cancer (PCa) development, we prospectively collected a total 46 tissue biopsy samples from 13 benign prostatic hyperplasia (BPH) and 33 PCa patients (‘Discovery Cohort’, Table S1). The total DNA of each specimen from the ‘Discovery Cohort’ was extracted and subjected to 16S rRNA amplicon based sequencing on an Ion GeneStudio S5 System targeting the hypervariable regions of 16S rRNA gene using two distinct primer sets as per manufacturer’s instructions. While Set I primer was used for amplification of hypervariable regions including V2, V4 and V8, set II primer was used for hypervariable regions including V3, V6-7 and V9. Additionally, as described later, in order to validate the sequencing data, we also collected another 31 tissue biopsy samples from 16 BPH and 15 PCa patients (‘Validation Cohort’, Table S1).
A total of 35,320,332 raw reads were generated after sequencing of the 46 samples from the ‘Discovery Cohort’. The number of raw sequence reads varied by approximately 10 fold across samples. After quality trimming and chimera checking, 30,641,026 high quality reads were mapped to two comprehensive 16S rRNA reference databases – Greengenes27 (v13.5) and the Thermo Fisher Scientific in-house MicroSeq 50028 (v2013.1). 6,301,956 reads were excluded from the analyses due to low copy number reads (less than 10 copy numbers) and 41,312 reads were found to be un-mapped in both databases (Table S2). 24,297,758 mapped reads subsequently identified Operational Taxonomic Units (OTUs) at the level of family (95% and above similarity index), genus (97% and above similarity index) and species (99% and above similarity index). Overall, a total of 6 phylum, 127 genera and 291 bacterial species were identified from all 46 samples in the discovery cohort.
Overall bacterial abundance and diversity among BPH and PCa samples: FASTQ files from 16S rRNA gene amplicon sequencing data of all 46 samples were analyzed using a web-based online tool - MicrobiomeAnalyst for statistical analysis29. Rarefaction curves of species richness against sequences per sample were plotted for BPH and PCa samples to determine the efficiency of the sequencing process (Fig. 1A-B). Most of the samples, though not completely, reached a saturated plateau phase, indicating that the depth of sequencing was sufficient for the diversity analysis (Fig. 1A-B). Both observed species (p = 0.015) and Chao1 index (p = 0.034) showed that species richness was significantly decreased in PCa samples as compared to BPH samples (Figs. 1C and 1D, respectively). In addition, the diversity estimators both Shannon index (p = 0.158) and Simpson index (p = 0.411) indicated a trend of depletion in relative diversity of species composition in PCa samples as compared to BPH samples, although the data was not statistically significant (Figs. 1E and 1F, respectively).
To assess the diversity among two groups, we evaluated weighted UniFrac distance matrix 30 from the OTU abundance through utilizing Permutational Multivariate Analysis of Variance (PERMANOVA) algorithm 31 and subsequently applied in Principal Component Analysis (PCoA) 32 (Fig. 1G). Given that PCoA analyses revealed no significant difference (p = 0.082) in the bacterial compositions between BPH and PCa samples (Fig. 1G), a ‘Random Forest’ algorithm was applied to further confirm the difference in bacterial community among the BPH and PCa biopsy samples (Fig. 1H). The decision trees extracted from the random forest classification identified distinct bacterial composition in 2 BPH samples and 11 BPH samples exhibited overlapping species with PCa samples (class error: 0.850; Fig. 1H). In contrast, all 33 PCa samples demonstrated unique bacterial compositions (Fig. 1H).
Taxonomic characterization of bacterial compositions among BPH and PCa samples: The bacterial communities associated with the diseased prostate lesions were further analyzed at different taxonomic levels. Six phyla including Firmicutes, Proteobacteria, Bacteroidetes, Actinobacteria, Fusobacteria and Deinococcus-Thermus collectively comprised of the entire sequences in both groups (Fig. 2A-B). In general, Proteobacteria was the most abundant phylum in the prostate microbial ecology, contributing ~40.6% in diseased prostate lesions (Fig. 2A-B). Overall, the results indicated that only Actinobacteria phylum was significantly depleted in PCa samples as compared to the BPH category (Fig. 2A-B). At the genus level, Prevotella, Cupriavidus, Propionibacterium, Acinetobacter and Corynebacterium represented the top five genera in diseased prostate lesion, comprising of 19.94%, 13.54%, 6.54%, 5.8% and 5.23% sequence coverage, respectively in BPH and 26.85%, 14.07%, 3.97%, 8.78% and 3.5% sequence coverage, respectively in PCa samples (Figs. 2C-D and S1). Of all genera detected, both groups shared approximately half of the total genera identified i.e. 56/107 in BPH category and 56/118 in PCa lesions (Fig. S1).
To identify the differentially enriched genera within BPH and PCa samples, we employed Linear Discrimination Analysis (LDA) Effect Size (LEfSe) method 33 (Fig. 2E). LEfSe analysis of top 20 bacterial genera identified Kocuria, Staphylococcus, Corynebacterium, Cellvibrio, Pseudomonas, Paracoccus, Brachybacterium, Pseudoxanthomonas, Anaerococcus, Stenotrophomonas, Microvirga, Empedobacter, Lysobacter, Brevibacterium, Comamonas, Serinicoccus, Rhodobacter, Chryseobacterium and Aeromicrobium as enriched genera in BPH samples, whereas only Bradyrhizobium genus was found to be significantly elevated in PCa samples (Fig. 2E). Overall, the results indicated that enriched diversity of bacterial taxa within diseased prostate lesions was mostly contributed by BPH samples.
Comparable enrichment analysis at species level identified significant bacterial compositions between BPH and PCa samples: Detailed analyses of bacterial compositions at species level demonstrated Prevotella copri, Cupriavidus campinensis, Propionibacterium acnes and Paracoccus sp. covered almost 50% of species variety among diseased prostate lesions including both BPH and PCa samples (Figs. 3A-B). Next, as similar to the genus level, cladogram and LEfSe analyses were conducted to further uncover the differential species compositions between BPH and PCa tissue biopsy samples (Figs. 3C-D). The LDA scores demonstrated that among top 20 most significantly enriched species Kocuria palustris, Cellvibrio mixtus, Pseudomonas stutzeri, Paracoccus sp, Staphylococcus hominis, Corynebacterium tuberculostearicum, Brachybacterium paraconglomeratum, Staphylococcus arlettae, Staphylococcus cohnii and Anaerococcus octavius were notably elevated in the BPH samples, while Cupriavidus taiwanensis, Methylobacterium organophilum, Brevundimonas vancanneytii, Neisseria flavescens, Acinetobacter junii, Bradyrhizobium cytisi, Cupriavidus basilensis, Caulobacter segnis, Leclercia adecarboxylata and Neisseria elongata were significantly increased in the PCa samples (Fig. 3D).
Validation of species identified in 16S rRNA sequencing by quantitative real-time PCR analyses in BPH and PCa samples: Although 16S rRNA amplicon based sequencing can produce robust data regarding the presence and abundance of bacterial species, horizontal gene transfer, multiple copies of 16S rRNA and other limitations can generate error prone abundance data 8,34. To confirm the 16S rRNA sequencing data, quantitative real-time PCR (qPCR) analyses were performed of top 2 bacteria in each category – BPH (K. palustris and C. mixtus) and PCa (C. taiwanensis and Methylobacterium organophilum), along with 3 most abundant bacteria (P. copri, C. campinensis and P. acnes) identified in diseased prostate lesions, using two distinct species-specific primers (Fig. 4). Moreover, in order to further corroborate the results, we used two sample cohorts – ‘Cohort - 1’ (Discovery cohort, used in 16S rRNA amplicon based sequencing) and ‘Cohort - 2’ (Validation cohort) as described in Table S1. As similar to LEfSe analyses between BPH and PCa biopsy samples, qPCR data further confirmed Kocuria palustris and Cellvibrio mixtus as BPH specific and Cupriavidus taiwanensis, and Methylobacterium organophilum as PCa specific bacterial species in both sample cohorts (Figs. 4A-B and 4C-D, respectively). In contrast to 16S rRNA sequencing results, qPCR analyses demonstrated Prevotella copri, Cupriavidus campinensis and Propionibacterium acnes were significantly enriched in PCa samples as compared to the BPH group in both sample cohorts (Fig. 4E-G).
Functional prediction of altered microbiome associated with the PCa development: In order to visualize the functional effects resulting from the altered microbial community associated with disease progression in prostate, we employed Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) software 35. PICRUSt analyses can predict the functional Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to the composition of a metagenome, and have been demonstrated to provide a good representation of metagenomic prediction 36. The LEfSe outputs of KEGG pathways among BPH and PCa tissue biopsy samples identified functions related to starch and sucrose metabolism, galactose metabolism, carbohydrate metabolism, primary immunodeficiency, ubiquitin system, Ion channels, proteasome, phenylpropanoid biosynthesis, electron transfer carriers, glycan degradation and N-Glycan biosynthesis were significantly associated with BPH condition, while pathways such as nitrotoluene degradation, steroid hormone biosynthesis, non-homologous end-joining and primary bile acid biosynthesis were directly linked to PCa development (Fig. 5A). PCoA also demonstrated that the predicted functions of bacterial compositions among BPH and PCa were significantly clustered (p < 0.05) (Fig. 5B).
Quantitative real-time PCR analyses detected strong association of multiple human tumor viruses with PCa progression: Studies suggest that a number of human oncogenic viruses including human papilloma viruses (HPVs) and Epstein-Barr virus (EBV) are associated with PCa development18,20,37. To evaluate the potential involvement of viral etiology in our samples we designed qPCR primers for seven human tumor viruses including EBV, two high risk HPVs – HPV-16 and HPV-18, hepatitis B virus (HBV), human T-cell leukaemia virus type 1 (HTLV-1), hepatitis C virus (HCV), Kaposi’s sarcoma associated herpesvirus (KSHV) and Merkel cell polyomavirus (MCPyV) along with two more human polyomaviruses – JC virus (JCV) and BK virus (BKV) (Table S3). qPCR analyses of cohort-1 with primer set-I comprising of EBV encoded EBNA3A (Gene ID: 3783762), HPV-16 encoded E2 (Gene ID: 1489080) and HPV-18 encoded E6 (Gene ID: 1489088), HBV encoded polymerase (Gene ID: 944565), HCV encoded polyprotein (Gene ID: 951475), KSHV encoded ORF27 (Gene ID: 4961487), HTLV-1 encoded envelope (Gene ID: 1491939), MCPyV encoded VP1 (Gene ID: 10987416), BKV encoded VP1 (Gene ID: 1489515) and JCV encoded Jvgp4 (Gene ID: 1489518) genes demonstrated that only EBV, HPV-16, HPV-18 and HBV are significantly associated with PCa samples as compared to BPH samples (Figs. 6A-D and SA-F). The housekeeping gene human GAPDH gene was utilized as control assuming the genomic segment bearing GAPDH gene remained unaffected in both BPH and PCa samples. A higher negative -ΔCt (average GAPDH Ct value – average target primer Ct value) indicated elevated presence of the virus in the sample as detected by specific primer set targeting specific viral gene. We further validated the association of these four selected tumor viruses in cohort-2 using second set of primers (Fig. 6A-D and Table S3). The results clearly demonstrated that all four tumor viruses were significantly associated with the PCa lesions in comparison to BPH samples (Fig. 6A-D).
Co-occurrence of tumor viruses with microbiome signature linked to BPH and PCa lesions: In order to further corroborate the connection of these tumor viruses with the identified microbial signature associated with BPH and PCa lesions, the co-occurrence and co-exclusion patterns of EBV, HPV-16, HPV-18 and HBV with the most abundant bacterial species identified in LEfSe and qPCR analyses in each category were further investigated (Fig. 7). As depicted in both LEfSe and qPCR analyses, BPH specific bacteria K. palustris and C. mixtus and PCa specific bacteria C. taiwanensis and M. organophylum were moderately correlated with each other (r = 0.362 and 0.394, respectively), indicating that they fell into two distinct groups (Fig. 7). Among BPH specific bacteria, interestingly only K. palustris was positively correlated with EBV (r = 0.446) and as expected negatively correlated with HPV-16 (r = -0.461) and to lesser extent with HBV (r = -0.263) (Fig. 7). In contrast, BPH specific bacterium C. mixtus was only found to be moderately negatively correlated with HPV-16 (r = -0.263) (Fig. 7). Among PCa specific bacteria, while C. taiwanensis was positively correlated with HPV-16 (r = 0.411) and to lesser extents with HPV-18 (r = 0.328) and HBV (r = 0.246), M. organophylum were positively correlated with EBV (r = 0.456) and to lesser extents with HPV-18 (r = 0.325) and HPV-16 (r = 0.261) (Fig. 7). Among the most abundant bacteria in both BPH and PCa tissue samples, all there species P. copri, C. campinensis and P. acnes were in general found to somewhat positively correlated with the PCa specific bacteria but not with BPH specific bacteria (Fig. 7). Among these three species P. copri and P. acnes were found to be particularly strongly correlated (r = 0.543) (Fig. 7). In addition, while P. copri was positively correlated with both C. taiwanensis (r = 0.369) and to a lesser extent M. organophylum (r = 0.246), C. campinensis and P. acnes were correlated with only M. organophylum (r = 0.318, and 0.282, respectively) (Fig. 7). While P. copri and P. acnes were robustly correlated with two tumor viruses – EBV (r = 0.551 and 0.599, respectively) and HPV-18 (r = 0.509 and 0.444, respectively), C. campinenesis was weakly correlated with only HPV-18 (r = 0.279) (Fig. 7). Among the 4 tumor viruses, EBV and HPV-18 were positively correlated with each other (r = 0.551), whereas HPV-16 and HBV were grouped together (r = 0.370) (Fig. 7).