Using the fecal microbiome of the Canada goose, we sought to characterize the function (using the metatranscriptome; “MT”), functional potential (using the metagenome; “MG”) and the taxa (using 16S rRNA gene sequencing, “16S”) of a complex wild animal’s microbiota. We also sought to determine the accuracy and utility of using metagenome simulation software in a non-model organism (using Picrust2 [26], “simulatedMGs”). We annotated the -omics data into three groups: KEGG orthologies (KOs), KEGG enzyme commission numbers (ECs), and MetaCyc pathways [47, 51, 52]. To determine the accuracy of predictions, we compared the raw number of identified KOs, ECs, and pathways in each data type (MGs, MTs, and simulatedMGs). We then looked into specific groups of ECs, KOs, and pathways to determine the accuracy of biologically relevant genes and pathways in the simulatedMGs. Some of these groups included antimicrobial resistance genes (AMRs), and genes and pathways (short chain fatty acids, SCFAs) critical for digestion in herbivores [3, 48, 53, 54].
PICRUSt2 is a computational alternative to metagenomic sequencing with high accuracy in many systems [26]; however, due to its inherent dependence on genomic databases that largely rely on microbiota associated with humans and model organisms, simulations of non-model organisms’ and environmental microbiomes appear to be less accurate [26, 28]. Our simulatedMGs did not show a significant difference in the number of KOs and ECs identified when compared to the MGs, whereas, the simulatedMGs contained significantly fewer pathways than the MGs. Notably, because PICRUSt2 relies on 16S rRNA data, the simulatedMGs only include bacteria and archaea. When we removed all of the non-prokaryotic KOs, ECs, and pathways, the simulatedMGs contained significantly more KOs than the MGs and the significant difference in the number of pathways was no longer present between the simulatedMGs and the MGs (Fig. 2). The MTs consistently had significantly fewer KOs, ECs, and pathways compared to the simulatedMGs and MGs, irrespective of the presence or absence of non-prokaryotic data. Thus, for these data, Picrust2 over-predicted the number of KOs, but identified commensurate prokaryotic ECs and pathways with MGs. It is possible that the difference observed in KOs could be the result of insufficient sequencing depth in the MGs. We generated rarefaction plots of the KOs showed almost all of the MGs and MTs plateaued (Figure S3), therefore we do not think increasing in sequencing will rectify the discrepancy between the simulatedMGs and the sequenced data. The low number of identified KOs, ECs, and pathways in the MTs was not a surprise, as not all genes and pathways are actively transcribed at the same time, and thus were not detected.
Principal coordinate analysis (PCoA) showed generally non-overlapping clustering of the MGs, MTs, and simulatedMGs (Fig. 3). Significant differences between the data types were observed for the KOs, ECs, and pathway data (PERMANOVA: p < 0.001). These differences were observed regardless of the presence or absence of the non-prokaryotic data. We thus conclude that the non-prokaryotic KOs, ECs, and pathways affect the raw counts (discussed above), but their removal does not change the significant differences observed between the MGs, MTs, and simulatedMGs, as whole samples. The simulatedMGs and MGs shared large portions of KOs, ECs and pathways (Table 1); however, there were many KOs, ECs, and pathways that were found in only the MGs and the simulatedMGs (Fig. 4). The MTs shared lower percent similarity with the MGs and simulatedMGs; however, this is likely due to the identification of KOs, ECs, and pathways that were not being transcribed. Furthermore, the greater identification of KOs, ECs, and pathways by the MGs and simulatedMGs is suggested by the fact that, on average, only 23 KOs, 7 ECs, and 2 pathways were found in only the MTs. This suggests the majority of the KOs, ECs and pathways found in the MTs were identified by either the MGs, simulatedMGs, or both. Therefore, with these data, our simulatedMGs overpredicts as well as fails to predict numerous KOs, ECs, and pathways when compared to sequenced data.
Folivore diets are inherently nutrient poor and organisms that consume large quantities of grasses and other leafy plants rely on their microbiome to breakdown the cellulolytic or hemicellulolytic foodstuffs into digestible components that the host can then absorb [3, 49]. The fermentation of dietary fibers into SCFAs has generally been reported in foregut or hindgut fermenting organisms and are characterized by long passage time of the ingested food, frequently requiring several hours to a few days to fully digest [3, 55]. Conversely, geese have significantly shorter passage times, ranging from a few hours to as short at 30 minutes and show limited microbial fermentation of plant matter in their gastrointestinal tracts [54, 56]. It has been suggested that geese rely primarily on a high-volume strategy where they extract small amounts of nutrients from copious amounts of food and use their proventriculus and gizzard to chemically digest and physically crush the plant matter [54, 57]. Our MGs and MTs identified numerous ECs that were related to cellulose and hemicellulose degradation, as well as SCFA pathways commonly found in ruminants and other herbivores [3, 48, 58]. Therefore, the Canada goose fecal microbiome has the potential to ferment using a plethora of pathways; we consistently found 16 SCFA pathways that produce acetate, butyrate, propionate, or lactate in most of the MGs and MTs (Fig. 5). The majority of the MTs were missing four pathways that were present in the majority of the MGs, but our data shows that the Canada goose microbiome is actively producing acetate, butyrate, lactate, and propionate (Fig. 5). We suspect the discrepancy between the MTs and the MGs is because either the necessary substrates were not present and thus no need for the particular pathway or due to the pathway quantification in HUMAnN2. HUMAnN2 requires at least one read for each step of the pathway to be present for it to be counted; therefore, it is possible one of the necessary enzymes in the pathway might have already been present in sufficient quantities and thus there was not active transcriptions occurring leading to the pathways appearing as absent. The simulatedMGs identified 14 of the 16 pathways found in the MGs, with two notable exceptions. PROFERM-PWY (L-alanine fermentation to propionate and acetate) and GLUDEG-II-PWY ( L-glutamate degradation VII to butanoate) were identified in most of the MGs, but were not found in any of the simulatedMGs. The simulatedMGs were less accurate with identifying ECs relevant to fiber digestions, and more accurate with identifying complete SCFA pathways.
The presence of SCFA pathways in the MGs and MTs suggest that the Canada goose microbiome is both capable of and is actively fermenting the food they ingest, which is contrary to the accepted dietary strategy for many geese. For example, the white-fronted goose (Anser albifrons), the bean goose (Anser fabalis), and the swan goose (Anser cygnoides) have all shown limited ability to ferment plant matter [59]; however, the Taihu goose (Anser cygnoides) [60] does use microbial dependent fermentation [54]. It is possible that microbial fermentation is species dependent and varies across goose species. Canada geese most likely rely on the high-volume strategy and mechanical means to break down their food, but our data suggest they may also benefit from their microbiome’s fermentation of the ingested plant matter. Because we do see that fermentation is occurring, in an organism with a quick passage time, further study is necessary to determine speed and efficiency of the Canada goose’s microbiome, as well as what bacteria are participating.
Antimicrobial resistance is a growing concern in the medical community and infection caused by an antibiotic resistant bacterium can lead to dangerous medical situations and can be lethal [61]. Because of the dangers posed by antibiotic resistance, it is crucial to describe and understand reservoirs of antimicrobial resistance genes (AMRs). Due to the frequency of defecation and their preferred habitat, geese can potentially introduce numerous antibiotic resistant bacteria in close proximity to humans. We compiled AMR KOs from the KEGG database and found 101 AMR KOs. More than half of these (54) were found in at least five samples in the MGs, MTs, or simulatedMGs. Vancomycin, aminoglycosides, and macrolide-lincosamide-streptogramin KOs were the most frequently detected AMRs in the Canada goose. The simulatedMGs overpredicted the amount of AMR KOs, identifying 24.24% more AMR KOs than the MGs (Table S4). One possible explanation for the overprediction is that antibiotic resistance in bacteria is of interest to human health and there may be a bias towards sequencing the genomes of AMR bacteria. This could have led to an overrepresentation of bacteria with AMR genes deposited in databases used for metagenome predictions.
The Canada goose microbiome had large quantities of Actinobacteria, Proteobacteria, and Firmicutes. The majority of the MTs were dominated by viruses and were also missing Bacteroidetes, which was found in large quantities in the MGs and simulatedMGs. In Canada Geese from Maryland, Ohio, and Ontario, Actinobacteria was a minor phylum in the fecal microbiomes (3.8%, 11.3%, and 2.7% respectively) [37, 40]. Our data estimated Actinobacteria to be much more prevalent, comprising an average of 22% of the phyla identified in the MGs and 7% of the phyla identified in the simulatedMGs. It is possible this difference is due to sampling locality our samples originated from the northeast United States and geographic distance is influential in the abundance of different microbes in the environment [62]. Bacteroides appear to be a major constituent of the microbiome in Canada geese across different populations [37, 40], and in our MG and simulatedMG data as well; however, Bacteroidetes was not detected in our MTs. This suggests that Bacteroidetes might appear as a key constituent of the microbiome, but may not be a particularly active member, and could be a transient phylum. More research should be conducted to verify if Bacteroidetes is an important member of the goose microbiome.
Our results supports existing data that show the microbiome of Canada geese to be less diverse and with lower bacterial loads than other birds [32]. We were able to get high genus-level resolution, identifying 17 genera at greater than 5% of the overall relative abundance of those, nine were responsible for large portions of the total KOs identified in the metagenomes (Fig. 7). Many of the genera identified contain species that are coliforms and pathogens: e.g., Helicobacter, Escherichia/Shigella, Campylobacter, Bacteroides, and Clostridium. One of the more consistent genera across the samples was Subdoligranulum, a close relative of Faecalibacterium, which is has been primarily found in humans and is characterized as a producer butyrate [63]. When we compared the relative contribution of these nine genera to the total KOs identified we saw Subdoligranulum and Faecalibacterium were responsible for a large portion of the identified KOs in the MGs. Other genera that contributed large portions of the identified KOs were Turicibacter, Megamonas, Helicobacter, Escherichia, Curtobacterium, Campylobacter, and Bacteroides. Our data show that a few genera can be responsible for the large portions of the identified KOs in most of our samples (Fig. 7). This suggests that in most Canada geese, these genera are key members and are responsible for large portions of the functional potential of the microbiome. This could vary from population to population; little is known about the compositional variation of the Canada goose’s microbiome across its entire range. A common axiom in microbiology is “a diverse microbiome is important for gut health”. Our data suggest that in the Canada goose, having a diverse microbiome might not be as critical as in other organisms and the processes needed for the goose’s health may be attained from relatively few taxa.
Viral pathogens in Canada goose feces are also of interest to the public and can be identified using -omics methods. Characterization of the viromes in the MGs and MTs (Fig. 1C) found that Bromoviridae was the most abundant virus family, with x̄ = 43.7% (MTs), and x̄ = 55.4% (MGs). Bromoviridae are plant RNA viruses and we speculate that we detected it in the MGs because the viruses might have been undergoing reverse transcription. The next most abundant virus family was Alphaflexiviridae with x̄ = 16.3% (MTs) and Retroviridae with x̄ = 12.5% (MGs). We did not find any viruses that were of concern to human health but there were some viruses in trace amounts that can infect geese, e.g., UR2 sarcoma virus (Retroviridae), goose adenovirus A (Adenoviridae), avian myelocytomatosis (Retroviridae), and avian sapelovirus (Picornaviridae). We also detected plant viruses like peanut stunt virus (Bromoviridae), tobacco ringspot virus (Secoviridae), and tobacco mosaic virus (Virgaviridae), (Table S7). Therefore, Canada geese feces may be a potential vector for agricultural crop viruses. More work is needed to fully assess the vector potential for Canada geese on crop production as well as other avian pathogens.