Optimization and Ev a Luation of Viral Metagenomic Amplification and Sequencing Methods Toward a Genome -le Vel Resolution of the Human Fecal DNA Virome

doi:10.21203/rs.3.rs-1097721/v1

Download PDF

Research

Optimization and Ev a Luation of Viral Metagenomic Amplification and Sequencing Methods Toward a Genome -le Vel Resolution of the Human Fecal DNA Virome

https://doi.org/10.21203/rs.3.rs-1097721/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: Viruses in the human gut have been linked to health and disease. Deciphering of the gut virome is dependent on metagenomic sequencing of the virus-like particles purified from the fecal specimens. A major limitation of conventional viral metagenomic sequencing is the low recoverability of viral genomes from the metagenomic dataset.

Results: Herein, we developed an optimal method for viral amplification and metagenomic sequencing to maximize the recovery of viral genomes. Using 5 fecal specimens with multiple repetitions, we revealed the optimal number of PCR cycles of high-fidelity enzyme-based amplification and the reliability of multiple displacement amplification in virome DNA preparation, verified the reproducibility of the optimally whole viral metagenomic experimental process, and tested the capability of long-read sequencing for improving viral metagenomic assembly. Based on our optimized results, we generated 151 high-quality viruses using the data combined from short-read (15 cycles for PCR amplification) and long-read sequencing. Genomic analysis of these viruses found that most (60.3%) of them were previously unknown and showed a remarkable diversity of viral functions, especially the existence of 206 viral auxiliary metabolic genes. Finally, we compared the viral metagenomic and bulk metagenomic sequencing approaches and revealed significant differences in the efficiency and coverage of viral identification between them.

Conclusions: Our study demonstrates the potential of optimized experiment and sequencing strategies in uncovering viral genomes from fecal specimens, which will facilitate future research about genome-level characterization of complex viral communities.

General Microbiology

Fecal DNA virome

Virus-like particles

Optimal method

High-quality viral genome

The human gut is a reservoir of numerous microbes including bacteria, archaea, eukaryotes, and viruses[1, 2]. Viruses are the smallest organisms in the human gut microbiome, but they may be the largest in terms of the variety of species[3] and the number of organisms, with 10⁹-10¹² viral particles per gram of feces[4, 5], and likely play important roles in shaping the gut microbiome.

As a valuable untargeted tool, the next-generation high-throughput sequencing had been comprehensively used to characterize the viral community (or “virome”) of the human intestine. Currently, there are two main approaches for sequencing viruses from the fecal samples: directly whole-metagenome sequencing (referred to as bulk metagenomic sequencing) and virus-like particle (VLP) enrichment and subsequently metagenomic sequencing (referred to as viral metagenomic sequencing)[6]. The bulk metagenomic sequencing approach is to extract the whole genetic materials of the gut microbiome; although the proportion of viral sequences is relatively low, it has the advantages of simply experimental operation and a high success rate of the library construction for next-generation sequencing. Additionally, a large number of publicly available bulk metagenomic datasets had provided massive input materials for virome mining, promoting the generation of several large-scale viral genome catalogs, such as Human Virome Database (HuVirDB)[7], Gut Virome Database (GVD)[8], Metagenomic Gut Virus (MGV)[6], and Gut phage Database (GPD)[3]. On the other hand, although the experimental process is more complex, the viral metagenomic sequencing has the advantages of representation of the more comprehensive viral communities and improved detection sensitivity for low abundance and rare viruses[9]. For these reasons, a series of studies had used the viral metagenomic sequencing to delineate the gut viral diversity and structure, showing that the gut virome of healthy adults is highly diverse, individual-specific, and broadly stable over long periods[10]. On base of these two approaches, increasing viral metagenomic studies had also shown the relevance of gut virome in human health and disease, which highlighted a considerable role of the viral community in multiple systemic disorders including rheumatoid arthritis[11], nonalcoholic fatty liver[12], alcoholic hepatitis[13], colorectal cancer[14], inflammatory bowel disease[15], and obesity-related type 2 diabetes[16]. These studies greatly promoted the understanding of extensively unexplored diversity of the human virome.

In practice, analysis based on the viral metagenomic sequencing is often insufficient in characterizing the individual viruses at the genome level. Due to the limitation of low biomass, viral metagenome inevitably requires multiple polymerase chain reaction (PCR) amplification procedures to obtain sufficient DNA for high-throughput sequencing[17]. For high-fidelity enzymes approach, the universal amplification cycle is 30-40, which may result in severe stochastic or systemic bias in viral genome fragmenting, skew viral abundances, and over-amplify small circular single-stranded DNA (ssDNA) viruses[6, 18-21]. Such viral amplification leads to uneven distribution of read coverage across the viral genome, which prevents genome reconstruction from the sequencing fragments[22, 23]. Reducing the cycle number for PCR amplification will efficiently reduce the amplification bias but simultaneously reduce the PCR products, this phenomenon thus raises the requirement for the determination of the optimal cycle number for amplification. Currently, multiple displacement amplification (MDA) has been proven efficient in the amplification of very small amounts of DNA and is frequently used in viral metagenomic analysis[24, 25]. However, this method also has a large amplification bias and can generate numerous template-independent errors[26], which limits its wide application in the genome-level analysis of the gut virome.

In addition to optimizing viral amplification experiments, another promised strategy for viral genome analysis is long-read sequencing. Long-read viral metagenomic analysis based on nanopore sequencing was preliminarily validated using mock viral communities as well as human oral virome samples, providing significant improvements in the recovery of viral genomes[27, 28]. Long-read sequencing data can theoretically capture the near-complete viral genomes with single reads, overcoming the challenges of assembling viral genomes from short-read data, however, it requires an additional 10-fold total amount of virus DNA with high integrity and molecular weight[9]. Coincidentally, MDA can amplify relatively long fragment products, and thus be helpful to prepare input materials for long-read sequencing.

In this study, we proposed the optimal viral metagenomic amplification and sequencing strategies aiming to maximize the recovery of viral genomes from human fecal specimens. We performed parallel virus enrichment and DNA extraction to generate ~30 viral DNA samples from each of 5 fresh fecal specimens and conducted the experiments including 1) optimizing the cycle number for PCR amplification, 2) evaluating the reproducibility of the optimally whole viral metagenomic experimental process, 3) evaluating the reliability of MDA, 4) evaluating the performance of long-read nanopore sequencing for metagenomic assembly, and 5) comparing the differences between viral metagenomic and bulk metagenomic sequencing. Our analyses recommended an optimized strategy and uncovered hundreds of metagenome-assembled viral genomes from these specimens, including 151 high-quality viruses that are able to deep taxonomical and functional characterizations.

Fecal specimen collection and preprocessing

The fecal specimens were collected near the laboratory and processed within 1 hour. More than 20g of the fresh fecal specimen per person was collected from five healthy volunteers using a sterile fecal collection box. In order to verify the repeatability of the experiment, after stirring and mixing by sterile tongue depressor, fresh fecal specimens were aliquoted into 35-50 subsamples of equal mass (0.17g each tube) in 1.5ml sterile tubes and stored at -80 ℃. This study was approved by the ethics committee of Dalian Medical University [NO 2020-014], and informed consent was obtained from all participants.

Experimental grouping and process

To answer 5 key questions in the gut DNA virome research shown in Figure 1, we carried out a scheme including 5 correspondingly experimental routes to compare their results. Firstly, to avoid the difference PCR template caused higher proportion and more divergent pattern of biased contigs[24] in fecal viromes obtained by MDA or Q5 high-fidelity enzyme amplification, the excessive amounts of virome DNA extracted in the same batch from 16 tubes of fecal subsamples per person were mixed thoroughly to ensure that all next MDA or high-fidelity enzyme amplifications used identical templates (except for repeatability verification experiments which used individual subsamples, Fig. 1a). To determine the effects of different PCR cycles on amplification bias during virome DNA preparation, we screen the optimal number of PCR cycles varied among 0 (PCR-free), 5, 15, and 30 (denoted as 0C, 5C, 15C, and 30C, where C represents the cycles) using the above pooled extracted virome DNA as template and their PCR products were used as input DNA for library construction of high-throughput sequencing. Secondly, to determine the reliability of multiple displacement amplification, we performed MDA amplification with mixed templates above. Thirdly, long-read sequencing by nanopore technique was applied to compare with Illumine sequencing results. Fourthly, the reproducibility of virome DNA enrichment and amplification procedure was further evaluated using randomly selected 4 independent tubes of fecal subsample per person as starting material which were different from the above mixed templates. Using the validated optimal 15 PCR cycles in the abovementioned experimental procedure, we even performed the experiment with four different independent repetitions for evaluating the reproducibility. Lastly, the differences in analysis results between bulk metagenomic and viral metagenomic sequencing was compared.

Virus-like particles enrichment and viral DNA extraction

The procedures of virus-like particles enrichment and viral DNA extraction according to our previously described protocol with minor modifications[20]. Briefly, each tube of feces (0.17g) was added 1ml of Hank’s Balanced Salt Solution (HBSS) without phenol red and was vigorously homogenized in the vortex (at least 15 s of pulse vortexing). Centrifuge the samples at 10000 × g for 10 min at 4 ℃. The supernatant was proceeded to serial filtrations using sequentially 0.45 µm and 0.2 µm filters[18]. Then, the samples were ultracentrifuged at 750000 × g for 60 min at 8 ℃ and the precipitate was resuspended by 500 µL of HBSS. 120 microliters of resuspension were transferred and treated with 23.4 µL of mixed nuclease [2.4 µl TURBO DNase (4.8 U, Invitrogen), 8 µL RNase A/T1 Mix (16 µg RNase A, 40 U RNase T1, Thermo Scientific), and 1 µL Benzonase (5 U, EMD Millipore)] for 120 min at 37 ℃. Nucleic acid was extracted immediately using the TIANamp viral genome DNA/RNA extraction kit (TIANGEN, China) according to the manufacturer’s instructions.

PCR-based amplification and multiple displacement amplification

The PCR-based amplification was performed according to the previous protocol[18] with minor modifications. Briefly, the first strand synthesis used a Large (Klenow) Fragment kit (New England BioLab, USA, Reaction system: 1µL of 20mM random amplification primer D2_8N 5’-AAGCTAAGACGGCGGTTCGGNNNNNNNN-3’, 2µL of 5X reaction buffer, 1µL of 10mM dNTP, 10.5µL of molecular grade DEPC H₂O, and 4µL of above-pooled virome DNA template) with predenaturation at 95℃ for 5 min and holding at 4 ℃. Then, the reaction system was immediately transferred to the ice water mixture and added 1.5 µL of Klenow fragment polymerase solution which contained 0.15µL 10X Klenow Buffer, 0.5µL Klenow fragment/2.5 U enzyme, and 0.85µL DEPC H₂O. Subsequently, this reaction system was incubated at 37℃ for 1 h. The second chain synthetic procedures were the same as the first strand synthesis. The synthesized virome dsDNA was treated with rSAP-Exonuclease-1 enzyme mix (New England BioLab, USA) at 37 ℃ for 1 h to remove excess dNTP and primer D2_8N. Next, the single primer amplification used the Q5® High-Fidelity DNA Polymerase (New England BioLab, USA, Reaction system: 8 µL of the second chain synthetic product, 10µL 5X Q5 Reaction Buffer, 3µL 50mM MgCl₂, 1.5µL 10mM dNTP, 3µL 20mM D2 primer 5’-AAGCTAAGACGGCGGTTCGG-3’, 1.25µL Q5 Ultra-Fidelity DNA Polymerase, and 23.25µL molecular grade DEPC H₂O) to perform amplification. Reaction conditions: initial denaturation at 95 ℃ for 5 min, then the different number of cycles 0C, 5C, 15C, and 30C were adopted which spanned 95 ℃ for 30 s, 55 ℃ for 30 s and 72 ℃ for 1.5 min, after incubating at 72 ℃ for 10 min, finally holding at 15 ℃.

Virome DNA was amplified by MDA method according to the manufacturer (GenomiPhi V2 Amplification kit, GE Healthcare, Little Chalfont, UK). 9µL sample buffer and 1µL of previously pooled virome DNA were pre-cooling at 4 ℃. Then, the reaction system was added 9µL reaction buffer and 1µL enzyme mix to amplify at 30 ℃ for 90 min.

Metagenomic sequencing

The concentration of the Q5 high-fidelity enzyme and MDA amplified and cleaned DNA was measured by Qubit dsDNA HS Assay Kit (ThermoFisher Scientific, Waltham, MA, USA). Based on their PCR product yields of Q5 high-fidelity enzyme suitable for the minimum standard of NGS library construction, 8-10 or 12-16 parallel amplified products from 5 cycles or 0 cycles of amplification were mixed, correspondingly (Figure 1). All the short-reads shotgun metagenomic sequencing of virome DNA was performed on the Illumina NovaSeq platform according to our previous research process [20]. Briefly, libraries were prepared with a fragment length of approximately 350 bp. Paired-end reads were generated using 150 bp in the forward and reverse directions. Based on the MDA product yields suitable for the minimum standard of nanopore library construction, 10 amplified products (= 2 parallel amplified products * 5 samples) were mixed (Figure 1). The long-read nanopore sequencing of virome DNA was performed on the PromethION platform. NEBNext FFPE DNA Repair Mix and NEBNext End repair / dA-tailing Module (New England BioLab, USA) were used for DNA chain damage repair, end repair and A base addition at the 3'end. Native Barcoding Expansion 1-12/13-24 (Oxford Nanopore Technologies, UK) and NEB Blunt/TA Ligase Master Mix (New England BioLab, USA) was used for barcode attachment. Then, the ligation sequencing kit (SQK-LSK109, Oxford Nanopore Technologies, UK) was used for library construction.

Data preprocessing and assembly

For the short-read data, raw metagenomic sequencing reads were filtered and trimmed using fastp v0.20.1[52] with the options ‘-q 20 -u 30 -y -l 90 --trim_poly_g’ to generate high-quality reads. Host contamination reads were removed by querying high-quality reads against the human genome GRCh38 using Bowtie2 v2.4.1[53]. For the long-read data, raw nanopore reads were filtered using NanoFilt v2.8.0 with the options ‘-q 7 -l 2000’. For each sample, all short-read data were mapped into the long reads using BLASTN v2.11.0, and the long reads with over 20% of the total length with >80% identity mappability by short-read data were kept. The high-quality short reads of each sample were assembled into contigs using SPAdes v3.14.1[54] with the options ‘-meta -k 21,33,55,77’. For the hybrid assembly, we developed a custom pipeline to improve the viral metagenomic assembly. Briefly, long reads from each sample were used to scaffold contigs pre-assembled by SPAdes into scaffolds via SSPACE-LongRead v1-1[55] with the options ‘-i 80 -a 300 -g 1500’. GapFiller v1.11[56] was then used to close gaps within scaffolds based on short reads from the corresponding sample with the options ‘-m 20 -r 0.6 -g 2’. To further reduce the number of gaps, the second scaffold filling step was performed based on long reads from the corresponding sample via FGAP v1.8.1[57] with the options ‘-i 80 -C 200 -R 2000 -I 2000’. The contigs with a minimum length of 3,000 bp were extracted from the scaffolds for further analyses.

Analyses of viral sequences

After assembly, all contigs (≥3,000 bp) were assessed by CheckV v0.7.0[30], and the contigs with more than 25% known microbial genes were removed. The remaining contigs were considered as viral sequences if they met any of the following criteria: 1) contig whose viral genes were more than the number of host genes in CheckV; 2) contig with p-value <0.01 in DeepVirFinder v1.0[58]; 3）contig identified by VIBRANT v1.2.1[59]. To decontaminate the catalog of viral sequences, according to the previous study[8], we firstly searched bacterial universal single-copy orthologs (BUSCO)[60] within viral sequence using HMMsearch with default options, and calculated the ratio of the number of BUSCO to the total number of genes in each viral sequence (BUSCO ratio). Then high-contaminated viral sequences with ≥5% BUSCO ratio were removed, and the remaining was identified as the final viral sequences for each sample. The filtered viral sequences were dereplicated based on the following steps: 1) pairwise alignments of all viral sequences were performed using BLASTN v2.11.0 with the options ‘-evalue 1e-10 -word_size 20 -num_alignments 10000’. 2) viral sequences which shared 95% nucleotide identity across 75% of the sequence were clustered into a viral operational taxonomic unit (vOTU) using the custom script. 3) The longest viral sequence was selected as the representative sequence for each vOTU.

Taxonomic annotation of viral sequences was implemented based on protein sequence alignment to the combined database derived from Virus-Host DB downloaded in May 2021, crAss-like protein sequences from Guerin’s study[61] and viral protein sequences from Benler’s study[31]. Putative proteins of viral sequences were predicted using Prodigal[62] with the option ‘-meta’, and then assigned to the reference database using DIAMOND v2.0.6.144[63] with the options ‘--query-cover 50 --subject-cover 50 --id 30 --min-score 50 --max-target-seqs 10’. A viral sequence was annotated to the viral family-level taxonomy when over a quarter of its proteins were matched to the same family.

Functional annotation of viral proteins was performed based on the KEGG database using DIAMOND v2.0.6.144 with the options ‘--query-cover 50 --subject-cover 50 -e 1e-5 --min-score 50 --max-target-seqs 50’. In addition, to identify the diversity-generating retroelement (DGR) system, we used DGRscan[64] to identify the template sequence (TR) and the variable region (VR) within vOTU. The viral protein was considered as the reverse transcriptase if it was annotated to the following KEGG Orthologs (KOs): K00986 K11126, K21037, K21038, K23454, K24802, and K25055. The vOTUs were identified as lysogenic or lytic viruses using VIBRANT[59].

The abundance of vOTU was calculated by aggregating the matching clean reads in each sample using Bowtie2 and SAMtools[65]. The abundance of the family-level population was obtained by aggregating the abundance of vOTUs matched to the same family. The relative abundance of each population (vOTU or family) was the ratio of its abundance to the total abundance of all populations in each sample.

Statistical analyses

All statistical analyses were performed in the R v4.0.2 platform. Alpha diversity indexes were assessed based on the relative abundance profile of vOTUs using the function diversity. PCoA was performed based on the Bray–Curtis distance using the function pcoa in the ape package. Data visualization was carried out using the ggplot2 package. Spearman's correlation coefficient was calculated using the cor.test function. Significance tests were performed using the function wilcox.test with the parameter ‘paired=T’.

Experimental process

Each fresh fecal specimen from five healthy volunteers was fully stirred and divided into 40-50 tubes immediately after defecation, and then stored at -80 ℃. For each specimen, we conducted standard VLP enrichment and DNA extraction (using our previous method[20]) for each of ~30 tubes for follow-up analysis. Our experimental plan and four main hypotheses were shown in Figure 1. Firstly, to explore the optimal cycle number for PCR amplification of high-fidelity enzyme, we prepared 4 amplification products of virome DNA samples for each specimen: one sample without amplification (0C; C represents the number of cycles) for direct DNA library construction and three samples with 5, 15 and 30 cycles of amplification (later referred to as 5C, 15C, and 30C), respectively (Fig. 1a). Due to the relatively low concentration of DNA, the 0C sample was mixed from 12-16 tubes of the unamplified virome DNA and the 5C sample was mixed from 8-10 tubes of the amplified products of virome DNA to get enough DNA for library construction of Illumina-based sequencing (minimum total DNA amount 0.05μg; Supplementary Table 1). Then, since 15 cycles of amplification had the optimal results by comprehensive consideration, to validate the repeatability of the whole viral metagenomic process, we prepared 4 parallel virome DNA products for each specimen (15C-0, 15C-1, and 15C-2 from three independently unpooled virome DNA samples, and 15C-3 from above Fig. 1a) and performed PCR amplification with 15 cycles for them (Fig. 1d). Next, we prepared additional 3 virome DNA amplified products with identical pooled templates per specimen using multiple displacement amplification (MDA) for 90 minutes. One MDA product was used for Illumina-based sequencing and compared with the non-MDA samples (PCR amplification by high-fidelity enzyme) to evaluate the reliability of MDA (Fig. 1b). The other two products were then mixed to get enough DNA for library construction and performed nanopore sequencing based on the PromethION platform to test the capability of long-read sequencing in improving viral metagenomic assembly (Fig. 1c). Finally, we compared the efficiency and coverage of viruses between the viral metagenome and traditional bulk metagenome (Fig. 1e).

Optimization of the cycle number for PCR-based amplification

For each individual, samples with different PCR amplification cycles (0C, 5C, 15C-0, 15C-1, 15C-2, 15C-3, and 30C) were compared. All samples were successful for DNA library construction and shotgun sequencing, except one sample (Subject No. 2 with 5C, #2-5C) was failed (Supplementary Table 1). We found that the samples in 30 cycles were significantly lower in both the number and total length of raw metagenomic-assembled contigs when compared with the 0C and 15C samples (Fig. 2a; Supplementary Table 2). Similar results were also found in the number and total length of identified viral contigs (Fig. 2b). Samples in 5 cycles showed unstable results, e.g., #3-5C (Subject No. 3 with 5C) had assembled the largest number of raw/viral contigs than other #3 samples, while #4-5C (Subject No. 4 with 5C) had assembled the fewest contigs comparing with other #4 samples. We calculated the “recall rate” for each sample that is defined as the number of viruses assembled from a sample divided by the number of viruses assembled from all samples of an individual (Fig. 2c). 0C, 5C, and 15C samples recovered average 37.8% (ranged from 27.5% to 53.9%), 34.8% (ranged from 11.7% to 69.8%), and 29.7% (ranged from 12.0% to 43.6%) of viruses in five fecal specimens, respectively, while the 30C samples recovered only average 13.0% (ranged from 4.2% to 18.2%) of viruses. In summary, these findings suggested that 0C, 5C, and 15C samples could reconstruct a considerable large proportion of the fecal DNA virome, moreover, the amplification cycle number of 15 was the optimized method considering its experimental convenience and stability of the result. Notably, although more contigs were recovered in low amplification cycle (i.e., 0C, 5C, and 15C) samples, their N50 length, as well as the estimated completeness of the viral contigs did not seem to extend (Fig. 2b; Supplementary Fig. 1), which suggesting that the assembly performance could still be improved by other technologies.

To further evaluate the potential deviation in viral community structure, we grouped the viral contigs of all samples into a catalog of nonredundant viruses using 95% nucleotide similarity[29] and then profiled the viral composition of various samples by mapping the sequencing reads against this catalog. Analysis of the within-sample viral diversity based on viral profiles revealed that the 30C samples had the lowest diversity (in all three diversity parameters) as comparing with the samples by other numbers of cycles, while the 15C samples had almost equal Shannon and Simpson indexes by comparing with 0C and 5C samples (Fig. 3d). Principal coordinates analysis (PCoA) and Spearman correlation analyses of the viral profiles showed that all samples belonging to the same individuals were closely clustered (Fig. 3e; Supplementary Fig. 2), all of which demonstrating the high consistency of viral composition across different amplification cycles.

Validation of technical replication

The four 15C repetitions using the same experimental procedures of each fecal specimen had markedly differed in DNA concentration and total DNA amount, and their data production and performance of raw metagenomic assembly were not significantly different (coefficient of variation [CV] <40% for each sample; Supplementary Table 1-2). Also, the number of identified viral contigs and their assembly parameters did not differ among repetitions. For 4 repetitions from each individual, the proportion of viral reads and the within-sample diversity parameters (i.e., Shannon and Simpson indexes, observed number of viruses) were similar (CV <40% for all; Supplementary Fig. 3a). PCoA showed that 4 samples of the same specimen clustered together (PERMANOVA R² = 0.97, p<0.001), and they significantly differed among samples from other individuals (Supplementary Fig. 3b). Similar, Spearman correlation analysis revealed that the samples of the same specimen were high consistently (ρ = 0.91±0.07, ranged from 0.73 to 0.98), while samples from the different specimens were distanced (ρ = 0.04±0.23, ranged from -0.27 to 0.42) (Supplementary Fig. 3c). These findings demonstrated that a high reproducibility result can be observed from independent repeat viral metagenome procedures.

Evaluation of MDA

The results by MDA method did not show a significant difference in the performance of raw metagenomic assembly compared with the 5C, 15C, and 30C samples, but their number and total length of raw contigs were lower than the 0C samples (Supplementary Fig. 4a; Supplementary Table 2). Similar, the results of viral identification of MDA samples were not very different from that of all non-MDA samples (Supplementary Fig. 4b). The “recall rate” (updated by adding the new samples) of MDA samples was average 22.6% (ranged from 12.0% to 29.9%), which was slightly lower than the 15C samples (average 27.2%) but remarked larger than the 30C samples (average 12.1%) (Supplementary Fig. 4c). Moreover, within each individual, the MDA samples were highly consistent with other non-MDA samples in both viral diversity and composition (Supplementary Fig. 5).

Combining short and long reads improve viral metagenome assembly

The data generated by Nanopore sequencing were preprocessed, which led to, on average, 266,026 (ranged from 77,682 to 431,229) long reads per sample, with average reads length 5,071 bp and an average quality score of 8.8 (corresponding base accuracy 86.8%; Supplementary Table 1). Notably, the average 96.3% of the short reads could be robustly mapped into the Nanopore long reads for each individual, confirming long reads well represented the DNA virome of original specimens. Based on a large amount of viral sequencing data, we first performed a state-of-the-art short-read assembly for each individual using the combined data of four 15C samples and then used the long reads for scaffolding the short-read-assembled contigs to generate a hybrid result, followed by gap-filling based on both short and long reads (see Methods). The hybrid assembly had significantly improved the performance of metagenomic assembly comparing with the short-read approach, with an average 15.8% (n = 2,743 vs. 2,369) increase of raw sequences and 19.8% (24.3 Mbp vs. 20.3 Mbp) increase of total length (Fig. 3a; Supplementary Table 3). Likewise, the number and total length of viral sequences had extended on average 23.6% (n = 545 vs. 441) and 26.3% (5.1 Mbp vs. 4.0 Mbp), respectively (Fig. 3b). Moreover, although the N50 length of these viruses was not extended, we found that the number of high-quality viruses (>90% completeness as estimated by CheckV[30]) was remarkably increased in the hybrid assemblies of each sample (Fig. 3c). These findings indicated that the long reads were effective in improving viral metagenome assembly and viral genome reconstruction.

Genomic and functional characterization of high-quality viruses

Next, we specifically focused on the high-quality viruses (n = 151; average length 50,952 bp; N50 length 60,043 bp; length ranged from 3,376 to 206,128 bp; Supplementary Table 4) generated by the hybrid assembly, as these viruses represented the dominant proportion (>75%) of viral relative abundances in original fecal specimens. The average estimated completeness of these viruses was 99.4% (ranged from 90.9% to 100%), while 72 of which were identified as “finished” genomes as they contained the high-confidence direct terminal repeat (DTR) or inverted terminal repeat (ITR) sequences. 70 of 151 viruses could be robustly assigned into the viral families, of which the members of Siphoviridae (n = 34), Microviridae (n = 10), Myoviridae (n = 6), were most frequently occurred (Fig. 4a), in agreement with the previous studies reporting that these three families were dominated in human gut virome[15]. Almost all known viruses were prokaryotic viruses, except 3 eukaryotic viruses (Circoviridae, n = 2; Geminiviridae, n = 1). We also assembled the near-complete genomes of 6 viruses that belonged to a candidate viral family, “Quimbyviridae”[31], suggesting the probably widespread of this family in gut virome. To further investigate the novelty of our virus catalog, we compared the viral genomes with three large-scale human gut virus datasets including Gut Virome Database (GVD)[8], Gut Phage Database (GPD)[3], and Metagenomic Gut Virus catalog (MGV)[6]. 60.3% (91/151) of our viruses were completely absent from all three databases (Fig. 4a), including all 10 members of Microviridae, highlighting that more unknown viruses are still needed to be identified in the human gut. In addition, we identified the bacterial hosts of 63 of 151 viruses based on their homology of genome sequences or CRISPR spacers to the available gut microbial genomes. This analysis revealed some novel virus-host affiliations such as 4 virus-host pairs between “Quimbyviridae” members and Prevotella spp. or the virus-host pair between a Microviridae virus and a Cyanobacteria species (Supplementary Fig. 6). Members of Firmicutes and Bacteroidetes were the most frequent hosts of the virus catalog (Fig. 4a), consistently with previous studies showing that these phyla are most dominant in healthy human gut[1, 32].

We predicted a total of 10,951 protein-coding genes from the high-quality viruses and annotated functions of 939 of these genes based on the KEGG (Kyoto Encyclopedia of Genes and Genomes)[33] database. Totaling 206 viral auxiliary metabolic genes (AMGs) that were assigned to specific metabolic pathways were further analyzed to elucidate the metabolic capabilities of the viruses (Supplementary Table 5). Strikingly, 22.7% of viral AMGs were involved in sulfur metabolism (Fig. 4b), in agreement with recent reports that the viruses are widely participants in both organic and inorganic sulfur metabolism in human gut[34, 35]. The proteins involved in the destructive metabolism of peptidoglycan (an important struct of bacterial cell walls) were frequently encoded by the viruses (consisting of 11.4% AMGs), such as peptidoglycan DL-endopeptidase function as both cell wall hydrolases and poly-γ-glutamic acid hydrolases[36], which would facilitate infection and fitness of bacterial host by such viruses. Besides these, we also found that the viruses encoded several important but rarely reported functions, including the enzymes involved in nicotinate and nicotinamide metabolism, folate metabolism, metabolism of other molecular (e.g., lipopolysaccharide, glycerophospholipid pantothenate, porphyrin). These findings largely extended the functional capacity of gut virome.

Virus identification in bulk metagenome versus viral metagenome

Averaging 814 viral contigs (ranged from 479 to 1,147) and an average total viral length of 6.8 Mbp (ranged from 3.9 Mbp to 9.6 Mbp) were generated from the bulk metagenome samples (Supplementary Fig. 7a), which were 49.4% and 33.5% larger than that of the hybrid-assembled viral metagenome samples, respectively. However, the N50 length and estimated completeness of bulk samples were significantly lower than those of VLP samples (Supplementary Fig. 7b), probably due to the lower proportion of viruses in bulk samples. Surprisingly, only an average of 16.4% (ranged from 10.9% to 22.3%) of the viruses identified by VLP metagenome were shared with the viruses of bulk metagenome (Fig. 5a). Further comparison at the family level between the VLP-specific and bulk-specific viruses revealed that, despite both two types of viruses were dominated with Siphoviridae and Myoviridae, they had significantly differed in frequency among some families (Fig. 5b). For example, 20 crAss-like phages were recovered by viral metagenomes but only 3 were assembled in bulk samples. Also, the VLP metagenomes uniquely recovered all Microviridae (n = 13), Circoviridae (n = 3), and Drexlerviridae (n = 2), and Genomoviridae (n = 2) viruses, whereas the bulk metagenome specially recovered Ackermannviridae (n = 5), Herpesviridae (n = 3), and Pithoviridae (n = 2). Moreover, an average of 2.4% (ranged from 1.3% to 4.3%) viruses in viral metagenomes were recognized as prophage by the CheckV prophage algorithm, while this proportion was average 5.4% (ranged from 3.8% to 8.1%) in bulk viruses (Fig. 5c).

Finally, we compared the viral profiles of viral and bulk samples to investigate the viral diversity and structure difference by these two technologies. Bulk samples showed higher within-sample diversity than the viral metagenome samples (Fig. 5d). Likewise, PCoA and Spearman correlation analyses showed that the viral profiles of viral metagenome and bulk metagenome samples from the same individuals remarkedly differed, with Spearman correlation coefficient ρ = 0.67±0.11 (Fig. 5e-f), and this phenomenon was also observed in viral composition at the family level (Fig. 5g). Collectively, our findings suggested a considerable difference between the two technologies in profiling the DNA virome of the human gut.

With the emergence of VLPs enrichment technology and the rapid development of high-throughput sequencing technologies, virome researches, especially DNA virome, were received widespread attention[24]. Disease-specific alterations in the gut virome, such as inflammatory bowel disease (IBD)[15, 37, 38], irritable bowel syndrome (IBS)[39], acute malnutrition[40], childhood obesity and metabolic syndrome[41], have been widely reported. However, a critical limitation of the human gut virome studies was that the different protocols (including enriching virus-like particles, nucleic acid purification, and sequencing strategies) adopted by different research groups were led to a general discrepancy in results[42]. In this study, we had developed the experimental and sequencing strategies that aim to improve the viral metagenome method at the genome-level characterization of the human fecal samples. Several experimental procedures employed spiking virus or mock communities with known compositions to evaluate their accuracy or reproducibility with complex biological samples[9, 43, 44]. However, since viruses have completely differed in particle size, overall charge, envelope, isoelectric points, icosahedral capsid shapes, and tails, the limited available viruses cannot fully represent the virome diversity of the human gut by using abovementioned artificial samples[42]. The strength of using actual biological samples, such as feces, could reveal diversity and variability that may not be apparent in low-diversity mock communities.

The main virome experimental steps for virome sample preparation included the concentration of viral particles, the elimination of contaminating cells and free nucleic acids, and extraction, amplification and purification of viral nucleic acids[17]. Herein, we performed different amplification and sequencing methods on fecal samples from five adults to identify viruses at the best genome-level resolution which could be used in human gut virome investigation. Because the average yield of VLP DNA from 2-5g of feces is only 500ng[2], to obtain sufficient material from low-biomass samples for metagenomic shotgun sequencing, virome nucleic acid random amplification was essential. We selected two enzymes including Q5® High-Fidelity DNA Polymerase and GenomiPhi V2 Amplification kit of MDA, especially the latter commonly used in virome research currently[44, 45]. However, MDA is known to severely decrease genetic diversity and reproducibility and can produce a large excess of ssDNA viruses[21, 46]. Although no significant difference in the performance of raw metagenomic assembly compared with Q5 high-fidelity enzyme samples, our results showed that MDA samples did have a lower “recall rate” than the 15C of Q5 high-fidelity enzyme samples within each individual, which indicated its shortcomings.

As we know, stochastic or systematic biases may be associated with the extent of random amplification[24], thus the number of amplification cycles is an important parameter for viral metagenomic analysis. The cycle number of PCR generally depends on the viral loads. Generally, polymerases can introduce mutations and fewer cycles are recommended for high viral loads, so the fewer cycles the less bias. A total number of 30 cycles generally works for any stool sample[47]. To avoid initial bias, we used redundantly identical DNA extract for all amplification processes in parallel. From our results, although all 0C, 5C, and 15C samples had the satisfactory number and total length of identified viral contigs, and almost equal Shannon and Simpson indexes, and could reconstruct a considerable large proportion of the fecal DNA virome. However, due to the low yield of products per reaction from 0C or 5C cycles’ samples, we need to combine at least 12 or 8 corresponding parallel samples to obtain sufficient material for metagenomic sequencing. This is a troublesome and increased workload operation. Moreover, DNA samples with low concentration were easily degradable and difficult to store and thus resulting in a high failure rate in library construction (e.g., the #2-5C sample in this study), therefore the amplification cycle number of 15 is the optimized method considering its experimental convenience and stability of the result. In addition, an amplification method from other literature reported that the virus contigs with median length around 5 kb[46], while our 15C results had medium ~11 kb contigs showing its advantage Based on the product yields of high molecular-weight DNA suitable for metagenomic sequencing, performance in assembly results, cost, and hands-on time, we selected Q5 high-fidelity enzyme with 15 amplification cycles (15C) as the basis for our virome DNA amplification.

Since nanopore sequencing requires higher amounts of nucleic acid than Illumina short-read sequencing, amplification is more inevitable. However, nanopore sequencing has the characteristics of long-read length (>30 kb). Yet, except for MDA amplification, other PCR enzymes generally amplify product length within 10 kb. Therefore, MDA amplification is the most suitable method in nucleic acid material preparation for nanopore sequencing. The advantage of hybrid assembly combined short-read and long-read sequencing has been widely reported [48, 49], which results in more high-quality viral genomes. Herein, we conducted a long-read shotgun metagenomic experiment using PromethION of nanopore sequencer by MDA amplification to study gut virome. Our results indicated that hybrid assembly based on the short- and long-read data was effective in improving viral metagenomic assembly and genome reconstruction.

The novelty of our virus catalog was proven by querying against three large-scale human gut virus datasets including GVD, GPD, and MGV. Few eukaryotic viruses may be related to the filtering operation in the virus enrichment process. Interestingly, we found that a large proportion of viral AMGs was involved in sulfur metabolism, in agreement with recent reports that the viruses are widely participants in both organic and inorganic sulfur metabolism in human gut[34, 35]. The sulfide provides a fitness advantage to viruses and viruses also are drivers of organosulfur metabolism with important implications for human health[34]. Since most VLPs were bacteriophages within bacterial hosts, peptidoglycan metabolism-related enzymes were frequently encoded by the viruses which are consistent with our viral AMGs results (up to 11.4%), such as peptidoglycan DL-endopeptidase function as both cell wall hydrolases and poly-γ-glutamic acid hydrolases[36], which may facilitate the interaction with the bacterial host. Although the mechanism is still unclear, other research had shown that bacterial lipopolysaccharide or peptidoglycan seems to be used by viruses to protect themselves[50]. Although existed in our viral AMGs, other enzymes involved in nicotinate and nicotinamide metabolism, folate metabolism, metabolism of other molecular (e.g., glycerophospholipid pantothenate, porphyrin) reported in the gut microbiome[51] yet rarely reported in viral research. Horizontal gene transfer may assist these functional genes delivered from the virus to its hosts, thereby further affecting the intestinal physiological function. These findings largely extended the functional capacity of gut virome and reinforced the necessity of incorporating viral contributions into further research on gut microbial function.

In the last issue of this study, we evaluated and compared the consequent by viral metagenomic and bulk metagenomic approaches. The result showed that only an average of 16.4% of the viruses identified by viral metagenome was shared with the viruses of the bulk metagenome, suggesting the good complementarity of the two methods. In addition, an average of 2.4% VLP viruses was recognized as a prophage, while this proportion was average 5.4% in bulk viruses. The main reason may be the genome of the prophage is mostly integrated into the host (bacteria), and the viral metagenomic analyses removed the hosts such as bacteria due to operations such as filtration. Therefore, viral metagenomic sequencing has natural limitations for prophage study. While, the prophages of bacterial genomes are not ignored in bulk metagenomic analysis, because they are retrieved when sequencing the bacterial fraction of the samples. Our results also suggested a considerable difference between these two technologies in profiling the virome of the human gut ecosystem. Thus, for more comprehensive virome information, the two methods should be combined.

Overall, we developed an improved and reproducible workflow that combined Illumina sequencing using high-fidelity enzyme amplification with 15 PCR cycles and nanopore sequencing using MDA enzyme to uncover hundreds of high-quality viral genomes from five fecal specimens. This work developed methods for the virome study in feces.

Ethics approval and consent to participate

Ethical approval for this study was obtained from the ethics committee of Dalian Medical University [File No 2020-014], and informed consent was obtained from all participants.

Consent for publication

Not applicable.

Availability of data and materials

The dataset supporting the conclusions of this article is available in the CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) repository with accession number CNP0002393 (https://db.cngb.org/mycngbdb/submissions/project).

Competing interests

The authors declared that they have no competing interests

Funding

The authors thank the National Natural Science Foundation of China (No. 81930112, 81902037), National Key R&D Program of China (2018YFC1705900), Distinguished professor of Liaoning Province (XLYC2002008), Dalian Science and Technology Leading Talents Project (2019RD15), Dalian Science and Technology Innovation Fund(2020JJ27SN069), and “1+X” program for Clinical Competency enhancement–Interdisciplinary Innovation Project, Second Hospital of Dalian Medical University.

Authors’ contributions

YM, XM, QY, SL, GW, and RG conceived and designed the study. QY, SL, GW, RG and FC conducted the laboratory work and developed the protocol. SL, RG, YZ FC and QL performed the data analysis. QY and SL the writing of the original draft; YM, XM, and SL the writing, reviewing, and editing; YM, XM and QY the funding; YM and XM the supervision of the manuscript. All authors read and approved the final manuscript.

Acknowledgements

Not applicable

Author details

1. Department of Microbiology, College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China

2. Pharmaceutical Research Center, Second Affiliated Hospital, Dalian Medical University, Dalian, China

3. Puensum Genetech Institute, Wuhan 430076, China

4. Key Laboratory of Precision Nutrition and Food Quality, Department of Nutrition and Health,, China Agricultural University, Beijing 100083, China

Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464(7285):59–65.
Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, Rohwer F, Gordon JI: Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 2010, 466(7304):334–338.
Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD: Massive expansion of human gut bacteriophage diversity. Cell 2021, 184(4):1098-1109 e1099.
Castro-Mejia JL, Muhammed MK, Kot W, Neve H, Franz CM, Hansen LH, Vogensen FK, Nielsen DS: Optimizing protocols for extraction of bacteriophages prior to metagenomic analyses of phage communities in the human gut. Microbiome 2015, 3:64.
Moreno-Gallego JL, Chou SP, Di Rienzi SC, Goodrich JK, Spector TD, Bell JT, Youngblut ND, Hewson I, Reyes A, Ley RE: Virome Diversity Correlates with Intestinal Microbiome Diversity in Adult Monozygotic Twins. Cell Host Microbe 2019, 25(2):261-272 e265.
Nayfach S, Paez-Espino D, Call L, Low SJ, Sberro H, Ivanova NN, Proal AD, Fischbach MA, Bhatt AS, Hugenholtz P et al: Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat Microbiol 2021, 6(7):960–970.
Soto-Perez P, Bisanz JE, Berry JD, Lam KN, Bondy-Denomy J, Turnbaugh PJ: CRISPR-Cas System of a Prevalent Human Gut Bacterium Reveals Hyper-targeting against Phages in a Human Virome Catalog. Cell Host Microbe 2019, 26(3):325-335 e325.
Gregory AC, Zablocki O, Zayed AA, Howell A, Bolduc B, Sullivan MB: The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut. Cell Host Microbe 2020, 28(5):724-740 e728.
Langenfeld K, Chin K, Roy A, Wigginton K, Duhaime MB: Comparison of ultrafiltration and iron chloride flocculation in the preparation of aquatic viromes from contrasting sample types. PeerJ 2021, 9:e11111.
Shkoporov AN, Clooney AG, Sutton TDS, Ryan FJ, Daly KM, Nolan JA, McDonnell SA, Khokhlova EV, Draper LA, Forde A et al: The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific. Cell Host Microbe 2019, 26(4):527-541 e525.
Mangalea MR, Paez-Espino D, Kieft K, Chatterjee A, Chriswell ME, Seifert JA, Feser ML, Demoruelle MK, Sakatos A, Anantharaman K et al: Individuals at risk for rheumatoid arthritis harbor differential intestinal bacteriophage communities with distinct metabolic potential. Cell Host Microbe 2021, 29(5):726-739 e725.
Lang S, Demir M, Martin A, Jiang L, Zhang X, Duan Y, Gao B, Wisplinghoff H, Kasper P, Roderburg C et al: Intestinal Virome Signature Associated With Severity of Nonalcoholic Fatty Liver Disease. Gastroenterology 2020, 159(5):1839–1852.
Jiang L, Lang S, Duan Y, Zhang X, Gao B, Chopyk J, Schwanemann LK, Ventura-Cots M, Bataller R, Bosques-Padilla F et al: Intestinal Virome in Patients With Alcoholic Hepatitis. Hepatology 2020, 72(6):2182–2196.
Nakatsu G, Zhou H, Wu WKK, Wong SH, Coker OO, Dai Z, Li X, Szeto CH, Sugimura N, Lam TY et al: Alterations in Enteric Virome Are Associated With Colorectal Cancer and Survival Outcomes. Gastroenterology 2018, 155(2):529-541 e525.
Clooney AG, Sutton TDS, Shkoporov AN, Holohan RK, Daly KM, O'Regan O, Ryan FJ, Draper LA, Plevy SE, Ross RP et al: Whole-Virome Analysis Sheds Light on Viral Dark Matter in Inflammatory Bowel Disease. Cell Host Microbe 2019, 26(6):764-778 e765.
Yang K, Niu J, Zuo T, Sun Y, Xu Z, Tang W, Liu Q, Zhang J, Ng EK, Wong SK et al: Alterations in the gut virome in obesity and type 2 diabetes mellitus. Gastroenterology 2021.
Thurber RV, Haynes M, Breitbart M, Wegley L, Rohwer F: Laboratory procedures to generate viral metagenomes. Nat Protoc 2009, 4(4):470–483.
Liang G, Zhao C, Zhang H, Mattei L, Sherrill-Mix S, Bittinger K, Kessler LR, Wu GD, Baldassano RN, DeRusso P et al: The stepwise assembly of the neonatal virome is modulated by breastfeeding. Nature 2020, 581(7809):470–474.
Yan Q, Wang Y, Chen X, Jin H, Wang G, Guan K, Zhang Y, Zhang P, Ayaz T, Liang Y et al: Characterization of the gut DNA and RNA Viromes in a Cohort of Chinese Residents and Visiting Pakistanis. Virus Evol 2021, 7(1):veab022.
Callanan J, Stockdale SR, Shkoporov A, Draper LA, Ross RP, Hill C: Biases in Viral Metagenomics-Based Detection, Cataloguing and Quantification of Bacteriophage Genomes in Human Faeces, a Review. Microorganisms 2021, 9(3).
Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+ C)-biased genomes. Nature methods 2009, 6(4):291–295.
Rodrigue S, Malmstrom RR, Berlin AM, Birren BW, Henn MR, Chisholm SW: Whole genome amplification and de novo assembly of single bacterial cells. PloS one 2009, 4(9):e6864.
Parras-Molto M, Rodriguez-Galet A, Suarez-Rodriguez P, Lopez-Bueno A: Evaluation of bias induced by viral enrichment and random amplification protocols in metagenomic surveys of saliva DNA viruses. Microbiome 2018, 6(1):119.
de la Cruz Pena MJ, Martinez-Hernandez F, Garcia-Heredia I, Lluesma Gomez M, Fornas O, Martinez-Garcia M: Deciphering the Human Virome with Single-Virus Genomics and Metagenomics. Viruses 2018, 10(3).
Ballantyne KN, van Oorschot RA, Muharam I, van Daal A, John Mitchell R: Decreasing amplification bias associated with multiple displacement amplification and short tandem repeat genotyping. Anal Biochem 2007, 368(2):222–229.
Warwick-Dugdale J, Solonenko N, Moore K, Chittick L, Gregory AC, Allen MJ, Sullivan MB, Temperton B: Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands. PeerJ 2019, 7:e6800.
Yahara K, Suzuki M, Hirabayashi A, Suda W, Hattori M, Suzuki Y, Okazaki Y: Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria. Nat Commun 2021, 12(1):27.
Gregory AC, Zayed AA, Conceicao-Neto N, Temperton B, Bolduc B, Alberti A, Ardyna M, Arkhipova K, Carmichael M, Cruaud C et al: Marine DNA Viral Macro- and Microdiversity from Pole to Pole. Cell 2019, 177(5):1109-1123 e1114.
Nayfach S, Camargo AP, Schulz F, Eloe-Fadrosh E, Roux S, Kyrpides NC: CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol 2020.
Benler S, Yutin N, Antipov D, Rayko M, Shmakov S, Gussow AB, Pevzner P, Koonin EV: Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome 2021, 9(1):78.
Costea PI, Hildebrand F, Arumugam M, Backhed F, Blaser MJ, Bushman FD, de Vos WM, Ehrlich SD, Fraser CM, Hattori M et al: Enterotypes in the landscape of gut microbial community composition. Nat Microbiol 2018, 3(1):8–16.
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017, 45(D1):D353-D361.
Kieft K, Breister AM, Huss P, Linz AM, Zanetakos E, Zhou Z, Rahlff J, Esser SP, Probst AJ, Raman S: Virus-associated organosulfur metabolism in human and environmental systems. bioRxiv 2021.
Kieft K, Zhou Z, Anderson RE, Buchan A, Campbell BJ, Hallam SJ, Hess M, Sullivan MB, Walsh DA, Roux S et al: Ecology of inorganic sulfur auxiliary metabolism in widespread bacteriophages. Nat Commun 2021, 12(1):3503.
Fukushima T, Uchida N, Ide M, Kodama T, Sekiguchi J: DL-endopeptidases function as both cell wall hydrolases and poly-gamma-glutamic acid hydrolases. Microbiology (Reading) 2018, 164(3):277–286.
Liang G, Conrad MA, Kelsen JR, Kessler LR, Breton J, Albenberg LG, Marakos S, Galgano A, Devas N, Erlichman J et al: Dynamics of the Stool Virome in Very Early-Onset Inflammatory Bowel Disease. J Crohns Colitis 2020, 14(11):1600–1610.
Zuo T, Lu XJ, Zhang Y, Cheung CP, Lam S, Zhang F, Tang W, Ching JYL, Zhao R, Chan PKS et al: Gut mucosal virome alterations in ulcerative colitis. Gut 2019, 68(7):1169–1179.
Coughlan S, Das A, O'Herlihy E, Shanahan F, O'Toole PW, Jeffery IB: The gut virome in Irritable Bowel Syndrome differs from that of controls. Gut Microbes 2021, 13(1):1–15.
Reyes A, Blanton LV, Cao S, Zhao G, Manary M, Trehan I, Smith MI, Wang D, Virgin HW, Rohwer F et al: Gut DNA viromes of Malawian twins discordant for severe acute malnutrition. Proc Natl Acad Sci U S A 2015, 112(38):11941–11946.
Bikel S, Lopez-Leal G, Cornejo-Granados F, Gallardo-Becerra L, Garcia-Lopez R, Sanchez F, Equihua-Medina E, Ochoa-Romo JP, Lopez-Contreras BE, Canizales-Quinteros S et al: Gut dsDNA virome shows diversity and richness alterations associated with childhood obesity and metabolic syndrome. iScience 2021, 24(8):102900.
Shkoporov AN, Ryan FJ, Draper LA, Forde A, Stockdale SR, Daly KM, McDonnell SA, Nolan JA, Sutton TDS, Dalmasso M et al: Reproducible protocols for metagenomic analysis of human faecal phageomes. Microbiome 2018, 6(1):68.
Kleiner M, Hooper LV, Duerkop BA: Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. BMC Genomics 2015, 16:7.
Conceicao-Neto N, Zeller M, Lefrere H, De Bruyn P, Beller L, Deboutte W, Yinda CK, Lavigne R, Maes P, Van Ranst M et al: Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis. Sci Rep 2015, 5:16532.
Roux S, Solonenko NE, Dang VT, Poulos BT, Schwenck SM, Goldsmith DB, Coleman ML, Breitbart M, Sullivan MB: Towards quantitative viromics for both double-stranded and single-stranded DNA viruses. PeerJ 2016, 4:e2777.
d'Humieres C, Touchon M, Dion S, Cury J, Ghozlane A, Garcia-Garcera M, Bouchier C, Ma L, Denamur E, E PCR: A simple, reproducible and cost-effective procedure to analyse gut phageome: from phage isolation to bioinformatic approach. Sci Rep 2019, 9(1):11331.
Maghini DG, Moss EL, Vance SE, Bhatt AS: Improved high-molecular-weight DNA extraction, nanopore sequencing and metagenomic assembly from the human gut microbiome. Nat Protoc 2021, 16(1):458–471.
Cook R, Hooton S, Trivedi U, King L, Dodd CER, Hobman JL, Stekel DJ, Jones MA, Millard AD: Hybrid assembly of an agricultural slurry virome reveals a diverse and stable community with the potential to alter the metabolism and virulence of veterinary pathogens. Microbiome 2021, 9(1):65.
Waldman P, Meseguer A, Lucas F, Moulin L, Wurtzer S: Interaction of Human Enteric Viruses with Microbial Compounds: Implication for Virus Persistence and Disinfection Treatments. Environ Sci Technol 2017, 51(23):13633–13640.
Blacher E, Bashiardes S, Shapiro H, Rothschild D, Mor U, Dori-Bachash M, Kleimeyer C, Moresi C, Harnik Y, Zur M et al: Potential roles of gut microbiome and metabolites in modulating ALS in mice. Nature 2019, 572(7770):474–480.
Chen S, Zhou Y, Chen Y, Gu J: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34(17):i884-i890.
Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nature methods 2012, 9(4):357–359.
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA: metaSPAdes: a new versatile metagenomic assembler. Genome research 2017, 27(5):824–834.
Boetzer M, Pirovano W: SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC bioinformatics 2014, 15(1):1–9.
Nadalin F, Vezzi F, Policriti A: GapFiller: a de novo assembly approach to fill the gap within paired reads. BMC bioinformatics 2012, 13(14):1–16.
Piro VC, Faoro H, Weiss VA, Steffens MB, Pedrosa FO, Souza EM, Raittz RT: FGAP: an automated gap closing tool. BMC research notes 2014, 7(1):1–5.
Ren J, Song K, Deng C, Ahlgren NA, Fuhrman JA, Li Y, Xie X, Poplin R, Sun F: Identifying viruses from metagenomic data using deep learning. Quant Biol 2020, 8(1):64–77.
Kieft K, Zhou Z, Anantharaman K: VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 2020, 8(1):90.
Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM: BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. arXiv preprint arXiv:210611799 2021.
Guerin E, Shkoporov A, Stockdale SR, Clooney AG, Ryan FJ, Sutton TDS, Draper LA, Gonzalez-Tortuero E, Ross RP, Hill C: Biology and Taxonomy of crAss-like Bacteriophages, the Most Abundant Virus in the Human Gut. Cell host & microbe 2018, 24(5):653-664 e656.
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics 2010, 11(1):1–11.
Buchfink B, Xie C, Huson DH: Fast and sensitive protein alignment using DIAMOND. Nature methods 2015, 12(1):59–60.
Ye Y: Identification of diversity-generating retroelements in human microbiomes. International journal of molecular sciences 2014, 15(8):14234–14246.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25(16):2078–2079.

Download PDF

Version 1

posted

You are reading this latest preprint version

Optimization and Ev a Luation of Viral Metagenomic Amplification and Sequencing Methods Toward a Genome -le Vel Resolution of the Human Fecal DNA Virome

Status:

Version 1

Abstract

Figures

Background

Methods

Results

Discussion

Conclusions

Declarations

References

Supplementary Files

Status:

Version 1