Population characteristics and study design
This study included 30 Chinese residents and 30 visiting Pakistani individuals who were recruited at Dalian Medical University in March 2019. Both cohorts consisted of 24 healthy adults and 6 of their child offsprings (Table 1). All adults were students or young teachers of the Dalian Medical University, and the Pakistani adults and children had arrived in China for 0–18 months (average of 11 months) and 0–15 months (average of 9 months), respectively. Notably, the Chinese and Pakistani adults showed significant differences on their body mass index (BMI), dietary habit, and drinking and smoking rates (Table 1; Table S1), which seemed to be due to ethnic and lifestyle differences.
Table 1
Characteristics of the subjects.
| Adults | | | Children | | |
| Chinese | Pakistani | P-value | Chinese | Pakistani | P-value |
Number of subjects | 24 | 24 | | 6 | 6 | |
Sex, F/M | 1/23 | 1/23 | 1.000 | 3/3 | 3/3 | 1.000 |
Age, years | 26.0 ± 4.3 | 29.1 ± 3.7 | 0.011 | 2.8 ± 1.8 | 2.8 ± 1.7 | 1.000 |
Weight, kg | 69.6 ± 11.0 | 76.7 ± 15.6 | 0.076 | 14 ± 5.4 | 13.3 ± 4.2 | 0.794 |
BMI, kg/m2 | 22.8 ± 2.8 | 25.6 ± 4.5 | 0.011 | 16.0 ± 2.0 | 17.4 ± 3.1 | 0.396 |
Drinking, % | 50% | 8.3% | 0.003 | 0% | 0% | 1.000 |
Smoking, % | 16.7% | 33.3% | 0.030 | 0% | 0% | 1.000 |
Antibiotics (≤ 2mons), % | 8.3% | 8.3% | 1.000 | 0% | 0% | 1.000 |
Prebiotics (≤ 2mons), % | 58.3% | 41.7% | 0.387 | 66.7% | 50% | 1.000 |
Living in China, mons | | 11 ± 4 | | | 9 ± 6 | |
The data for age, weight, and BMI were presented as mean ± sd. P-values for age, weight, and BMI were calculated by Student’s t-test, and for sex, drinking, smoking, antibiotics, and prebiotics were calculated by Fisher’s exact test. |
Fecal samples of all participants were collected and treated using a unified approach (see Methods). To depict the gut viral characteristics of healthy individuals, we extracted DNA and RNA from fecal VLP factions and performed high throughput shotgun sequencing using the Illumina platform. To extend the content of total microbial community, the bacterial microbiome of feces was also profiled using whole-metagenomic sequencing. The analytical workflow of the DNA virome, RNA virome, and bacterial microbiome was shown in Fig. 1. Focusing on the comparison of gut viromes between Chinese and Pakistani individuals, overall, this study included six sections to elaborate the results:
1–2. DNA virome and its functional characteristics.
3–4. RNA virome and the concordance between DNA and RNA viromes.
5–6. Bacterial microbiome and the virus-bacteria associations.
Comparison of DNA viral community
We obtained 782 million high-quality non-human reads (12.1 ± 0.5 million per sample) through shotgun sequencing of the DNA viral community of 60 fecal samples. The reads were de novo assembled into 182,471 contigs with the minimum length threshold of 1kbp, of which 45.0% (82,119) were recognized as highly credible viral fragments based on their sequence features and homology to known viral genomes (Fig. 1). The remaining contigs were from bacterial or eukaryotic contaminations (26.8%) and dependency-associated sequences (6.0%), and 22.2% contigs were still unclassifiable. Despite that, average 82.3% of sequencing reads in all samples were captured by the viral contigs, revealing well representativeness of the high-abundance viral contents in human gut DNA virome. The viral contigs were further clustered into 54,947 “viral operational taxonomic units (vOTUs)” (a phylogenetic definition of discrete viral lineage that corresponds to “species” in prokaryotes, also named “viral population” [35]) by removing the redundant contigs of 95% nucleotide similarity. These vOTUs represented an average size of 3,054 ± 2,868 bp (Figure S1), which was comparable with similar studies [14] but remarkable lower than that of the available viral genomes (average 38.5 kbp for ~ 6,500 complete virus isolates from the RefSeq database), suggesting that the vOTUs were mostly fragmented genomes. Only 33.6% of vOTUs could be annotated into specific family, highlighting a considerable novelty of gut virome.
Rarefaction analysis showed that, despite the rarefaction curve was unsaturated under current number of samples in each group, the vOTU richness was significantly higher in Pakistani adults than in Chinese adults (p = 0.008, Fig. 2a). The within-sample diversity pattern of gut DNA viromes was assessed by macrodiversity (Shannon index) and microdiversity (nucleotide diversity or π [35]) at the vOTU level. The Chinese adults showed a lower Shannon index than the Pakistani adults, similarly for the children (Fig. 2b), but no significant difference in microdiversity was detected between Chinese and Pakistanis (Fig. 2c).
Next, we undertook a non-metric multidimensional scaling (NMDS) analysis to further understand the differences in fecal DNA viral communities between Chinese and Pakistanis. Clear separations were revealed in the viromes of both adults and children between Chinese and Pakistanis (adonis p < 0.001 for both adults and children; Fig. 2d). Notably, we also found that 1) the viral communities of Chinese adults and children were similar, but those of Pakistani adults and children were differed, and 2) the viral communities of Pakistani children were closer to Chinese subjects when compared with those of Pakistani adults. These findings were validated by the permutational multivariate analysis of variance (PERMANOVA) (Fig. 2e).
We finally compared the DNA virome composition of Chinese and Pakistani at the family level, ignoring the family-level unclassified vOTUs (which represented only 33.1% of total sequences). The most dominant viral families in all samples were Podoviridae-crAssphage (average relative abundance, 27.0 ± 30.7%), Siphoviridae (24.8 ± 25.5%) and Adenoviridae (23.7 ± 28.1%) (Fig. 2f). Compared with the Chinese adults, the viral communities of the Pakistani adults showed a significant increase of Adenoviridae, Anelloviridae, Marseilleviridae, and Lavidaviridae, and a remarkable depletion of Circoviridae and Rudiviridae (Mann-Whitney U test, q < 0.05; Fig. 2g). Adenoviridae, Myoviridae, Phycodnaviridae, Mimiviridae, Herelleviridae, and Inoviridae were significant higher in viral communities of Pakistani children (Fig. 2h), as compared with the Chinese children, while no viral family was lower.
Functional analysis of DNA virome
To better elucidate the functional capacity of the DNA viromes, we predicted a total of 221,418 protein-coding genes from the vOTUs (average of 4 genes per vOTU) and annotated functions of 24.2% of these genes based on the KEGG (Kyoto Encyclopedia of Genes and Genomes) [36] database. Analysis on KEGG pathway level B showed that functions involved in genetic information procession and signal and cellar processes are dominant in all samples (Fig. 3a), suggesting that these are core functions of the gut DNA virome. Compared with the Chinese adults, viral functions in the Pakistani adults were significantly decreased involving “protein families: metabolism”, amino acid metabolism, antimicrobial drug resistance, cell motility, and substance dependence, and increased in immune disease (Mann-Whitney U test, q < 0.05; Fig. 3b). For example, a putative hemolysin enzyme (K03699) that encoded by several Myoviridae and Siphoviridae viruses showed over 10-fold enrichment in the virome of Chinese adults compared to that of Pakistani adults. When compared with the Chinese children, a number of important functions, including carbohydrate metabolism, signal transduction, and cell growth and death, were significantly higher in the viral communities of Pakistani children, while the “protein families: genetic information processing” were lower (Fig. 3c).
We identified a total of 11,242 CAZymes (Carbohydrate-active enzymes [37]) from the viral genes, including 5,437 glycoside hydrolases, 3,270 glycosyl transferases, 1,993 carbohydrate binding, 396 carbohydrate esterases, 120 polysaccharide lyases, and 26 auxiliary activities (Fig. 3d). The majority (65.9%) of CAZymes were encoded by unclassified vOTUs, followed by Siphoviridae (12.1%) and Myoviridae (8.2%), suggesting their important roles in carbohydrate metabolism in gut viral ecosystem. Moreover, we also identified 37 acquired antibiotic resistance genes (ARGs) from the DNA vOTUs (Table S2). Most of these ARGs were related to tetracycline resistance (n = 12), macrolide resistance (n = 7), beta-lactamase (n = 7), and aminoglycoside resistance (n = 6). Taken together, these findings revealed that the DNA virus can widely express the carbohydrate metabolism-associated genes and are potentially involved into carrying and transmission of antibiotic resistance genes.
Comparison of RNA viral community
For RNA virome, we performed shotgun metatranscriptomic sequencing of 60 fecal samples described above and obtained 671 million reads (11 ± 3.4 million per sample) after removing the low-quality reads and bacterial ribosomal RNA contamination. A total of 99,454 contigs with minimum length threshold of 500 bp were assembled, 3,442 (3.5%) of which were identified as highly credible RNA viral fragments via blasting against the available RNA viral genomes and searching of the RNA-dependent RNA polymerase (RdRp) sequences (Fig. 1). 25.4% of these RNA viruses contained at least one RdRp gene, while 28 viral RdRp genes had no homology with any known virus in NCBI database. We obtained 569 RNA vOTUs based on clustering at 95% nucleic acid level similarity. The average size of these vOTUs was 1,162 ± 916 bp, which was fragmented compared with the available RNA viral genomes (average 7.4 kbp from ~ 4,000 isolates). Furthermore, considering that only average 24.8% reads of all samples were covered from the RNA vOTUs, we also used the available RNA viral genomes from the RefSeq database as a reference for analyzing of the gut RNA virome. 118 available RNA viruses were observed in our samples, which covered additional 1.3% reads (in average) for further analysis.
Rarefaction analysis showed that the detection of RNA virus was increased with the number of samples, and the accumulative curve was nearly saturated at nearly 10 samples (Fig. 4a). This is due to our RNA virus pipeline mainly focused on the known species and the sequence containing a RdRp gene, but high proportions of virus remain untagged and many of them are independent on RdRp gene [38]. Compared with Pakistanis, the macrodiversity (Shannon index) was significantly higher in Chinese adults, but there was no statistical difference in that of children (Fig. 4b).
NMDS analysis on the overall RNA vOTUs composition captured significant separation of adults between Chinese and Pakistanis (adonis p < 0.001; Fig. 4c), but of children the separation was visible but not significant (adonis p = 0.2). Likewise, the viral communities of Chinese adults and children were closer, yet of Pakistani adults and children.
Finally, to investigate the gut RNA viral signatures between Chinese and Pakistanis, we compared two cohorts on viral composition. At the family level, the dominant family Virgaviridae consisted of average 83.7% relative abundance in all samples (Fig. 4d), which was slightly but significantly enriched in Chinese adults compared with that in Pakistani adults (Fig. 4e). Three other families, Betaflexiviridae, Picornaviridae, and Astroviridae, was reduced in Chinese adults than in Pakistani adults (Mann-Whitney U test, q < 0.05 for all), while Picornaviridae was also reduced in Chinese children than in Pakistani children. At the species level, the plant-associated virus, including Pepper mild mottle virus (average relative abundance, 37.5 ± 23.1%), Tomato mosaic virus (27.1 ± 27.4%), and Tobacco mild green mosaic virus (14.1 ± 12.4%), composed of the dominant species in all samples (Fig. 4f). Compared with the Chinese adults, the viral communities of the Pakistani adults showed a significant increase of Shallot latent virus, Picornavirales Tottori-HG2, Aichivirus A, and Astrovirus VA3, and a remarkable depletion of Paprika mild mottle virus, Peach virus T, Enterovirus C, and Cosavirus A (Fig. 4g). When compared with the Chinese children, 9 species were significantly higher in viral communities of Pakistani children (Fig. 4h), with no species that was lower.
Concordance between DNA and RNA viromes
Having characterized the differences of DNA and RNA viromes between local Chinese residents and visiting Pakistanis, we wanted to examine the existence of concordance between DNA and RNA viromes. Although the DNA and RNA viromes were irrelevant in Shannon diversity index (Pearson r = 0.04, p = 0.7; Fig. 5a), the overall compositions of two types of viral community were strongly correlated (Procrustes correlation M2 = 0.37, p < 0.001; Fig. 5b). And this correlation was reproducible across nationality and age. Moreover, we identified 24 co-abundance correlations between 6 DNA and 9 RNA viral families (Spearman correlation test q < 0.05; Fig. 5c), including some positive correlations between Adenoviridae and several RNA viruses and a negative correlation between Herpesviridae and Tombusviridae. The significance of these relationships required further studies.
Comparison of bacterial microbiome
For bacterial microbiome, we obtained a total of 1,236 million reads (20.6 ± 7.7 million per sample) from the samples and quantified the relative abundances of a total of 833 taxa, including 12 phyla, 22 classified, 41 orders, 81 families, 179 genera, and 498 species, using MetaPhlAn2 [39]. Comparison on Shannon index showed that the bacterial microbiome of Chinese adults exhibited a significantly higher diversity than that of the Pakistanis (Fig. 6a), similarly but not significantly trend was observed in that of children. NMDS analysis on the overall bacterial composition also revealed significant separation between Chinese and Pakistan adults (adonis p < 0.001; Fig. 6b), as well as between Chinese and Pakistan children (adonis p < 0.001). Consistent with the observations in DNA and RNA viromes, the bacterial microbiome of Pakistan children was also close to that of Chinese subjects in tendency.
Taxonomically, the bacterial microbiome of Chinese adults showed significant enrichment of Lachnospiraceae, Ruminococcaceae, Eubacteriaceae, Enterobacteriaceae, Tannerellaceae, Rikenellaceae, Acidaminococcaceae Clostridiaceae, and Sutterellaceae and depletion of Prevotellaceae, Bifidobacteriaceae, Coriobacteriaceae, Lactobacillaceae, Oscillospiraceae, Selenomonadaceae, and Atopobiaceae, compared with that of Pakistani adults (linear discriminant analysis [LDA] score > 3; Fig. 6c). Similarly, Clostridiaceae, Eubacteriaceae, and Ruminococcaceae were enriched in Chinese children compared to Pakistani children, and Coriobacteriaceae was depleted. At the species level, the Chinese adults exhibited 28 enriched bacterial species and 19 decreased species when compared with the Pakistani adults, while the Chinese children showed 11 enriched species and 12 decreased species compared with the Pakistani children (Table S3). The exhibition of enormous differential taxa led to a dramatic distinction of enterotype constitution between Chinese and Pakistanis. The Chinese subjects was characterized by a high proportion of Bacteroides/Firmicutes-type (75% and 100% in adults and children, respectively), whereas almost of all Pakistani subjects were Prevotella-type (100% in adults and 66.7% in children) (Fig. 6d).
Virus-bacteria associations
To study the virus-bacteria correlation, first, we predicted the bacterial hosts of virus by searching the potential viral CRISPR spacers from bacterial metagenomic assemblies (see Methods). This approach allowed host assignments for 3,948 DNA and 4 RNA vOTUs, representing 7.2% and 0.7% of all DNA and RNA viruses, respectively. We revealed a large connection network of family-level known virus (n = 392) and its bacterial host (Fig. 7a), facilitated by frequent acquisition of phage/prophage in bacterial genomes and spread of phages across bacterial hosts. Members of Faecalibacterium, Prevotella, Ruminococcus, Bifidobacterium, Dialister, and Streptococcus were the most common host for human gut virome. Meanwhile, the crAss-like phages had infected the highest number of bacteria.
Then, we performed the PERMANOVA-based effect size analysis between gut virome and microbiome. 287 DNA vOTUs (q < 0.10), including members of Siphoviridae, Phycodnaviridae, and Podoviridae-crAssphage, and 25 RNA vOTUs (q < 0.10) showed significant affection on the bacterial microbiome communities (Fig. 7b-c). More importantly, combination of these DNA and RNA vOTUs explained 20.2% and 18.2% of the microbiome variance, respectively (Fig. 7d), suggesting that the effect size of the gut virome on bacterial microbiome is considerable. Similar effect sizes were found in subjects from two nations. Parallelly, 117 bacterial species were identified that significantly impact the holistic composition of DNA and RNA viromes, accounting for 13.2% virome variance (Fig. 7d). These species included Bifidobacterium angulatum, Streptococcus salivarius, Bacteroides coprophilus, and Prevotella copri (Fig. 7e).