Performance and output of the combined 16S rRNA gene and ITS amplicon sequencing
With the rapidly expanding list of microbes being characterized for their role in human health and disease, it is crucial to be able to identify and differentiate between i) commensal microbes that co-exist in a symbiotic state within various niches of interest within the human body and ii) pathogens that induce dysbiosis and drive several clinical conditions. Studies targeting colonization patterns of fungi are particularly scarce, especially for the vaginal ecosystem when compared to bacteria [5,6]. A simple text search performed on the NCBI nucleotide database shows that there are 209,784 entries for vaginal specific bacteria, while there are only 1,458 entries for fungi, highlighting the disparity. We show that a combined sequencing method that targets the 16S rRNA gene for bacteria and the fungal ITS region is a convenient strategy to study two distinct and clinically relevant microbial kingdoms in parallel without loss of data or incurring notable extra costs.
To our knowledge, this work is the first to report an amplicon sequencing method to study bacteria and fungi simultaneously within human samples using Illumina sequencing technology. Previously, Kittleman et al. reported a pyrosequencing approach that combined analysis of phylogenetic marker genes for bacteria, archaea and eukaryotic micro-organisms in ruminal samples and Coller et al. combined 16S rRNA gene and fungal ITS1 amplicons to an Illumina MiSeq run to study soil samples [43,44]. In addition, many studies have generated 16S rRNA gene and fungal ITS amplicons in the same samples, but the analyses were done separately [8,9,13–18]. Also, bacterial ITS amplicons have been employed for improved taxonomic resolution in parallel to conventional 16S rRNA gene amplicon sequencing [45,46]. Kits for single library constructions for bacteria and fungi are also commercially available (at least from Swift, Qiagen and Perkin Elmer), but the costs per sample are substantially higher and hence our method provides a cost-effective option for laboratories that process high sample volumes and possess the basic expertise for the index-PCR. In earlier studies where a single Illumina run was used for sequencing of both bacteria and fungi, the index-PCR was performed separately for each type of amplicon and they were only combined for sequencing [43,44]. Hence, to our knowledge we are the first to report a protocol where the pooling of bacterial and fungal amplicons takes place already after the locus-specific PCRs, and hence saves time and costs as all downstream steps including index PCR for both amplicon types occur in a single well. Use of the same barcodes for bacterial and fungal amplicons further simplifies the design and improves the efficiency especially when there are limited number of barcodes available.
By combining 16S rRNA gene and ITS amplicon sequencing in a parallel setting we were able to achieve comprehensive community composition data for both bacterial and fungal communities with very little added cost compared to ordinary 16S rRNA gene sequencing. The 16S rRNA gene and ITS libraries can be separated easily with publicly available bioinformatic tools like cutadapt that was used here and was the only additional step in pre-processing. The rarefaction curves indicated that with the chosen input of 177 vaginal samples in a single MiSeq run, both bacterial and fungal species were sequenced to an extent sufficient to represent their true diversity (Additional file 1: Supplement Fig. 3). Comparisons to conventional 16S rRNA gene amplicon and metagenomic sequencing demonstrated that the newly developed method produced highly similar results for bacterial compositions and there was a high degree of reproducibility between the results despite some methodological differences (HiSeq vs MiSeq, PE250 vs. PE300 chemistry) as a possible source of variation.
The simultaneous analysis of bacterial and fungal communities from the same samples is not a trivial approach since efficient and unbiased cell disruption is required to retrieve representative nucleic acids for further analysis. We used bead beating method for DNA extraction that has been previously shown to perform well on disrupting the rigid cells walls of fungi [47]. The analysis of fungal components of the microbiota is much more challenging than for bacteria e.g. due to the natural length polymorphism of ITS1 length that may complicate both the actual sequencing and subsequent bioinformatic analyses.
Text-mining based annotation and prioritization of hits
Achieving species level taxonomic annotations is often desired and crucial when investigating homogenous ecological niches such as the female genital tract, which is predominantly populated by different species of lactobacilli, rendering it necessary to attempt for identification of individual species and their patterns of interaction with the host. Presently, sequence homology is the main criterium used for taxonomic classifications within amplicon sequencing workflows, which however cannot distinguish between highly similar sequences, resulting in only a fraction of the sequence queries being assigned to species-level. This is also challenging in BLAST-based annotations, which can potentially produce thousands of highly similar alignment results and have significant amount of noise that may mask real, more relevant annotations. Conventionally, selecting results from the BLAST output would either involve taking the “top-hit” from a sorted list of annotations or require manual scanning. We show that by utilizing readily available information from public databases the selection of the most probable hit can be facilitated by means of knowledge-based species level ecosystem specific taxonomic annotations. By using BLAST as the alignment tool of choice and the nucleotide database as the reference to maximize the number of species that are potentially detected, our approach automates this selection process. Taxonomic annotations are split into a two-tiered process: i) sequence homology is determined by BLAST algorithms, with a threshold set at 98% for both “percentage identity” and “query coverage” and ii) ambiguity between identical alignments is eliminated by introducing background information such as previously identified host and source of isolation as a selection criterion to identify the most relevant hit. Hypothetically, given a set of 10 indistinguishable species level annotations, this approach allows us to probabilistically select the 5th alignment indicated to have been isolated from human vaginal swabs, and disregarding the other 9 that came from diverse sources including chicken crop, beer malt, and giant pandas. The filtration can be strict enough to look for specific strains of a microbe investigated in an environmental niche, or vague enough to accept anything collected from human samples.
While we demonstrate this tool in combination with BLAST based annotations, it can essentially be used as an added layer to any number of existing tools that can provide a link to NCBI accession IDs or attach previously gathered information for each alignment. By allowing any combination of user defined text queries, it can be customized for each individual study and enhances existing tools by increasing confidence in the final taxonomic annotations. In addition, information such as publications, journals, authors, lineage, and strain level resolution are also made available in the process. Strain identification is of particular interest as it is needed to identify and study the established biological and functional disparity in microbial strains, as discussed later [11,48,49].
We acknowledge that this approach it limited by the inherent limitations of 16S rRNA gene amplicons on discriminating bacteria to species level so that in less characterized ecosystems even the best guess, based on homology and background information, can be of low confidence. Another limitation may come from data availability, especially in terms of identifying novel species for a particular environment. However, the latter is remedied by the trickle-down application of the selection process, where the most important and most likely annotations are assigned first, followed by expanding the filtration criteria to include the remaining potentially rare/novel taxa with decreasing confidence.
Microbial profiles
Whole genome shotgun (WGS) and an individual 16S rRNA gene sequencing method validated the bacterial profiles obtained from our parallel sequencing approach. Since the samples were from healthy subjects, they were dominated by lactobacilli, while samples with mixed/BV-like microscopy results [19] were indeed populated with G. vaginalis, F. vaginae, Veillonella spp., uncultured bacteria, and C. genomosp. (BVAB1). Since we report relative abundances, the discrepancy observed in the proportion of G. vaginalis observed in some samples could be attributed to the absence of these uncultured bacteria from the metagenome results, which are otherwise make up to 10% of the total abundance of the samples in question. The inability of metagenome pipeline to detect these microbes is largely due to incomplete reference databases. While we were not specifically searching for individual strains, the final annotations obtained through the filtration approach contained 11 different strains of G. vaginalis (see Additional file 1: Supplement Fig. 5), a microbe associated to bacterial vaginosis (BV), which have previously been described to exhibit varying capacities for sialidase (an enzyme responsible for mucus degradation), and beta galactosidase production [50–53]. Different strains of G. vaginalis have been identified with varying degrees of virulence based on the presence/absence of the genetic machinery facilitating adherence, biofilm formation, cytotoxicity, glycan degradation, antibiotic resistance, and displacement of lactobacilli [52]. This is indicative of the tendency of some G. vaginalis strains to colonize a non-BV vaginal environment as a commensal microbe while other strains can exhibit pathogenicity, highlighting the importance of achieving strain level resolution to understand their biological characteristics and as a diagnostic tool. As our samples were from healthy subjects, this could represent an early or intermediate state of dybiosis.
While we obtained high concordance between the bacterial profiles obtained with our combined amplicon sequencing vs. WGS approach, fungi could only be detected with the former. This is due to the underrepresentation of fungal reads within the metagenomes as well as their coverage in the reference databases used for WGS approach. The yield for fungal reads within the metagenomes was 0.08 - 0.17% in our study and 0.17±0.04% in a previous large scale vaginal microbiota study [11]. Eukaryotic genomes are also significantly larger and much more complex compared to prokaryotes, which further hinders metagenome analysis, especially with a restricted set of reference databases [54]. Apart from S. cerevisae being detected in one sample (ID: 1192) with MetaPhlan, there were no positive hits in the WGS data while 23 unique species were detected across the 50 samples by ITS amplicon sequencing. This shows that even in this era of relatively low-cost shotgun metagenomic sequencing, amplicon sequencing still has the edge when studying complex low abundance communities. With the lack of studies focusing on fungi, deposited fungal sequences with valid human isolation sources are also scarce. In our samples only C. albicans, a well-characterized pathogenic yeast responsible for vulvovaginal candidiasis [5], was identified from vaginal samples. The remaining annotations were traced back to either the gut (Aspergillus spp.), skin (Malassezia spp. and Cladosporium spp.), or other clinical isolates (S. cerevisiae, Schizophyllum spp.). Of these annotations, Cladosporium spp. were a possible wildcard because of the uncertainty surrounding their isolation source. On one hand, the metadata for Cladosporium spp. hits stated that they were isolated from clinical human samples and certain species have been associated to human and animal infections. On the other hand, they are also widely recognized as household fungi and are shown to be a consistent environmental contaminant in fungal studies [8,55,56].
Investigating the correlations between the micro- and mycobiota revealed some interesting trends, notably the positive correlation between C. albicans and S. anginosus. Both microbes are individually implicated in negative clinical outcomes and Streptococci spp. have been shown to co-aggregate with C. albicans, which allows it to form a thicker biofilm and obtain antibiotic resistance [57,58]. Prevotella spp. and Malassezia spp. have previously been shown to positively correlate on skin, which opposed our findings, indicating either a different colonization pattern in this ecological niche or presence of different strains [59]. Positive correlation between L. reuteri and an uncultured fungus of genus Peniophora could potentially be attributed to their shared ability to bind/sequester and degrade aflatoxin B1, a mycotoxin produced by Aspergillus spp., as a survival mechanism [60–62]. We speculate that this might be indicative of an evolutionary characteristic that these microbes possess to inhabit the same environment. Among the Cladosporium species, the only observed correlation was a negative one between C. allicinum and G. vaginalis [63,64]. Due to the scarcity of information on the vaginal fungal communities and their potential associations to clinical and other host factors, we carried out the same PERMANOVA analysis for fungi as previously done for the bacterial microbiota of the same samples [65]. We found number of sex partners (lifetime), period day, history of BV, and antibiotic and probiotic use to be associated with fungal abundance profiles while education was the strongest factor explaining bacterial variation [65]. This divergence suggests that the drivers of colonization patterns for vaginal bacteria and fungi may be independent of each other [65]. For the full PERMANOVA analysis results see Additional file 2.
Functional profiles
Even with a relatively small sample size, functional profiles obtained from the metagenome analysis revealed clear separation of bacterial profiles and their metabolic potential. The capacity for L. crispatus to synthesize lysine has been previously identified as well as its absence in L. iners [66]. An interesting observation was the co-occurrence of the almost complete mevalonate pathway (six enzymes), which produces cholesterol as an end-product, and inerolysin, a cholesterol dependent cytolysin (CDC) in samples containing L. iners. Presence of the mevalonate pathway has been indicated in urogenital bacterial populations and associated to the ability of certain lactic acid bacteria to cope with oxidative stress, which may be an additional adaptation that allows L. iners to colonize hostile environments as well as co-habit with other microbes, especially G. vaginalis [66–68]. CDCs such as inerolysin and vaginolysin (G. vaginalis specific) have been associated to membrane adhesion and subsequent pathogenicity. Pore formation in the host cell membranes, a characteristic of CDCs, has been speculated to allow the microbes to gain access to nutrients and potentially kill immune cells, allowing L. iners to survive in a dysbiotic state [66,69,70]. A repertoire of enzymes involved in the biosynthesis and degradation of inosine monophosphate (IMP) were observed in samples containing G. vaginalis. Even though the enzymes associated with degradation could not be assigned to a single taxonomy, their exclusivity in these samples already shows an interesting trend. Other studies have discussed inosine in various contexts, including as a metabolite from related intestinal Bifidibacterium pseudolongum that modulates an enhanced immunotherapy response, while for some strains of G. vaginalis it has been indicated as a carbon source [71,72]. These findings warrant further investigations to understand the role of inosine metabolism among certain G. vaginalis strains within the vaginal ecosystem.