Background:
Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut phageome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut.
Results:
A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3,738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, “Flandersviridae” and “Quimbyviridae”, include some of the most common and abundant members of the human gut virome that infect Bacteroides, Parabacteroides and Prevotella. The third proposed family, “Gratiaviridae”, consists of less abundant phages that are distantly related to the families Autographiviridae, Drexlerviridae and Chaseviridae. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some “Quimbyviridae” phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several “Flandersviridae” phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The “Gratiaviridae” phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species.
Conclusions:
Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes.

Figure 1

Figure 2

Figure 3

Figure 4
This is a list of supplementary files associated with this preprint. Click to download.
Phylogenetic tree of the dnaG and dnaE genes in Gratiaviridae phages (.pdf).
Coverage heatmap of a “Flandersviridae” genome across human gut viromes. The coverage of the most abundant “Flandersviridae” phage genome (accession OLOC0100071.1) is plotted as a heatmap, scaled from 0 – 100x fold coverage per 100 bp window (.pdf).
Phylogenetic tree of the polA and dnaG genes in Flanders-like phages. Branches composed of GenBank phages are colored in orange and branches of gut metagenomic phages in blue. Branches with sequences labelled as bacteria in the GenBank database, likely representing cryptic prophages, are colored in grey (.pdf).
HHPred alignments of four Quimbyviridae proteins and one Gratiaviridae protein with their top-scoring templates, including a replication initiator protein, a cytosine-specific methyltransferase, an adenine-specific methyltransferase, a MutY nuclease and HipA kinase (.docx).
Alignment of the template and variable repeats from the Quimbyvirus DGRs. The template repeat (TR) from Quimbyvirus is the first listed sequence and is followed by the variable repeats from either ORF80 (VR1) or ORF 47 (VR2) encoded in phage genomes that are nearly identical to Quimbyvirus (> 95 % average nucleotide identity). A total of 21 adenine residues (green) in the template repeat exhibit a least one substitution in a corresponding variable repeat (.pdf).
Phylogenetic tree of the MCP, primase and portal proteins encoded by “Quimbyviridae” phages (.pdf).
Coverage heatmap of a “Quimbyviridae” genome across human gut viromes. The coverage of the most abundant “Quimbyviridae” phage genome (accession OMAC01000147.1) is plotted as a heatmap, scaled from 0 – 100x per 100 bp window (.pdf).
Genome maps of predicted anti-CRISPR proteins (Acrs) in uncharacterized Bifidobacteria phages. Open reading frames are colored according to function: large terminase subunit (red), structural components (blue), replication (orange), integrase (pink), general function (green) and unknown (grey). The candidate Acrs are indicated with a dashed box (.pdf)
Candidate anti-CRISPR (Acr) proteins encoded on each phage genome. Proteins predicted to be an Acr with high confidence (score > 0.9) and meeting all other heuristic criteria are listed along with their inferred host. For the Bifidobacteria phages (see the main text), the nucleotide coordinates of closely related prophages integrated in their host genome are provided in columns 6-8 (.csv)
Host ranges inferred from CRISPR-spacer matches. The nucleotide coordinates of the protospacer are provided in columns 2 and 3. The sequence of the CRISPR spacer and protospacer are provided in columns 4 and 5. The taxonomic information of the host is listed in the subsequent columns (.csv).
Distribution of the marker profiles identified on each phage genome and pie chart representation of the gut phage taxonomy. (A) Histogram of marker proteins detected on each phage genome recovered from human gut metagenomes. Abbreviations are as follows: M, major capsid protein; P, portal, T, terminase large subunit. (B) Taxonomic assignments of the dereplicated contigs (n = 1,886), with the outermost ring corresponding to ICTV families (.pdf).
Taxonomic information for 1,886 representative phage genomes. Each taxonomically classified phage genome was scored by ViralVerify, Seeker and vcontact2 and the assignments are provided in columns 9-10, 11 and 12-13, respectively (.csv).
Clustering information for 3,378 gut phages identified in the study. Phage genomes were dereplicated at 95% identity over 80% of the contig length. The GenBank accession codes for each phage and the representative sequence is provided in columns 1 and 3, respectively (.csv).
Loading...
On 27 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
Posted 28 Jan, 2021
On 28 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
Posted 14 Oct, 2020
Received 29 Dec, 2020
On 29 Dec, 2020
Received 22 Nov, 2020
On 27 Oct, 2020
On 27 Oct, 2020
Invitations sent on 25 Oct, 2020
On 06 Oct, 2020
On 06 Oct, 2020
On 05 Oct, 2020
On 05 Oct, 2020
On 27 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
Posted 28 Jan, 2021
On 28 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
On 26 Jan, 2021
Posted 14 Oct, 2020
Received 29 Dec, 2020
On 29 Dec, 2020
Received 22 Nov, 2020
On 27 Oct, 2020
On 27 Oct, 2020
Invitations sent on 25 Oct, 2020
On 06 Oct, 2020
On 06 Oct, 2020
On 05 Oct, 2020
On 05 Oct, 2020
Background:
Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut phageome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut.
Results:
A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3,738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, “Flandersviridae” and “Quimbyviridae”, include some of the most common and abundant members of the human gut virome that infect Bacteroides, Parabacteroides and Prevotella. The third proposed family, “Gratiaviridae”, consists of less abundant phages that are distantly related to the families Autographiviridae, Drexlerviridae and Chaseviridae. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some “Quimbyviridae” phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several “Flandersviridae” phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The “Gratiaviridae” phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species.
Conclusions:
Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes.

Figure 1

Figure 2

Figure 3

Figure 4
This is a list of supplementary files associated with this preprint. Click to download.
Phylogenetic tree of the dnaG and dnaE genes in Gratiaviridae phages (.pdf).
Coverage heatmap of a “Flandersviridae” genome across human gut viromes. The coverage of the most abundant “Flandersviridae” phage genome (accession OLOC0100071.1) is plotted as a heatmap, scaled from 0 – 100x fold coverage per 100 bp window (.pdf).
Phylogenetic tree of the polA and dnaG genes in Flanders-like phages. Branches composed of GenBank phages are colored in orange and branches of gut metagenomic phages in blue. Branches with sequences labelled as bacteria in the GenBank database, likely representing cryptic prophages, are colored in grey (.pdf).
HHPred alignments of four Quimbyviridae proteins and one Gratiaviridae protein with their top-scoring templates, including a replication initiator protein, a cytosine-specific methyltransferase, an adenine-specific methyltransferase, a MutY nuclease and HipA kinase (.docx).
Alignment of the template and variable repeats from the Quimbyvirus DGRs. The template repeat (TR) from Quimbyvirus is the first listed sequence and is followed by the variable repeats from either ORF80 (VR1) or ORF 47 (VR2) encoded in phage genomes that are nearly identical to Quimbyvirus (> 95 % average nucleotide identity). A total of 21 adenine residues (green) in the template repeat exhibit a least one substitution in a corresponding variable repeat (.pdf).
Phylogenetic tree of the MCP, primase and portal proteins encoded by “Quimbyviridae” phages (.pdf).
Coverage heatmap of a “Quimbyviridae” genome across human gut viromes. The coverage of the most abundant “Quimbyviridae” phage genome (accession OMAC01000147.1) is plotted as a heatmap, scaled from 0 – 100x per 100 bp window (.pdf).
Genome maps of predicted anti-CRISPR proteins (Acrs) in uncharacterized Bifidobacteria phages. Open reading frames are colored according to function: large terminase subunit (red), structural components (blue), replication (orange), integrase (pink), general function (green) and unknown (grey). The candidate Acrs are indicated with a dashed box (.pdf)
Candidate anti-CRISPR (Acr) proteins encoded on each phage genome. Proteins predicted to be an Acr with high confidence (score > 0.9) and meeting all other heuristic criteria are listed along with their inferred host. For the Bifidobacteria phages (see the main text), the nucleotide coordinates of closely related prophages integrated in their host genome are provided in columns 6-8 (.csv)
Host ranges inferred from CRISPR-spacer matches. The nucleotide coordinates of the protospacer are provided in columns 2 and 3. The sequence of the CRISPR spacer and protospacer are provided in columns 4 and 5. The taxonomic information of the host is listed in the subsequent columns (.csv).
Distribution of the marker profiles identified on each phage genome and pie chart representation of the gut phage taxonomy. (A) Histogram of marker proteins detected on each phage genome recovered from human gut metagenomes. Abbreviations are as follows: M, major capsid protein; P, portal, T, terminase large subunit. (B) Taxonomic assignments of the dereplicated contigs (n = 1,886), with the outermost ring corresponding to ICTV families (.pdf).
Taxonomic information for 1,886 representative phage genomes. Each taxonomically classified phage genome was scored by ViralVerify, Seeker and vcontact2 and the assignments are provided in columns 9-10, 11 and 12-13, respectively (.csv).
Clustering information for 3,378 gut phages identified in the study. Phage genomes were dereplicated at 95% identity over 80% of the contig length. The GenBank accession codes for each phage and the representative sequence is provided in columns 1 and 3, respectively (.csv).
Loading...