Sample collection and bacteria genome DNA extraction
All methods in this study were approved by the Research Medical Ethics Committee of Shanghai University of Medicine & Health Sciences affiliated Zhoupu Hospital. The feces samples were collected from a 34-year old male and a 10-month old baby and then stored in the Ultra-Low Temperature Freezer (Haier, Qingdao, China). The genome DNA extraction were performed with FastDNA Spin Kit for Feces (MPbio, California, USA). To acquire sufficient high-quality gut microbiota genome DNA, we improved the experimental procedure, as follows: we added 500 mg feces in a 2 ml Lysing Matrix E tube, then mixed the feces with 825 μl Sodium Phosphate Buffer and 275 μl PLS solution, then shook the mix and vibrated for 15 seconds. Afterwards we centrifuged the samples at 14,000 g for 5 minutes at room temperature and decanted supernatant. Subsequently, we added 978 μl Sodium Phosphate Buffer and shook the mix and vibrated the mixture for 15 seconds, then added 122 μl MT Buffer and shook up and down gently for 5 minutes. Then we placed the samples in the shaker at 4 centigrade for 30 minutes; centrifuged samples at 14,000 g for 5 minutes and then transferred the supernatant to a clean EP tube; added 250 μl of PPS solution, shook vigorously to mix, and incubated at 4°C for 10 minutes and centrifuged samples at 14,000 g for 2 minutes; transferred supernatant to the Binding Matrix Solution in a 15 ml conical tube and shook gently for 5 minutes. Then we centrifuged samples at 14,000 g for 2 minutes and decanted the supernatant. Afterwards, we washed the binding mixture pellet with 1 ml Wash Buffer #1 and transferred the binding mixture to a SPIN Filter tube and centrifuge at 14,000g for 1 minute. We emptied the catch tube and added 500 μl of prepared Wash Buffer #2 to the SPIN Filter tube and gently resuspended the pellet. Afterwards, we centrifuged the samples at 14,000 g for two times to to extract residual ethanol. Finally, we transfer the SPIN Filter bucket to a clean 1.9 ml Catch Tube and add 100 μl TES to resuspend the genome DNA. The DNA were detected with agarose gel electrophoresis, and the top bands were isolated and purified with a DNA Purification Kit (Finegene, Shanghai, China). The DNA concentration and integrity were assessed by a NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, USA).
For Pacific Biosciences sequencing library preparation and SMRT sequencing, DNA was fragmented by a Covaris g-TUBE device (10 kb) and was concentrate DNA with AMPure PB beads following the manufacturer’ protocol (Beckman Coulter Co., USA). The DNA damage and ends were repaired in a LoBind microcentrifuge tube. Blunt ligation reaction was performed by adding 1 μL of blunt adaptor (20 μM ) and 1 μL of ligase to the 30 μL of DNA and then incubation was performed at room temperature for 15 min. SMRTbell™ templates were purified with AMPure PB beads and then the concentration was measured by Qubit. Sequencing was performed on a PacBio Sequel instrument by OE Biotech Co., Ltd (Shanghai, China).
Metagenome assembly was performed using flye software after getting valid reads. ORF prediction of assembled scaffolds using prodigal was performed and translated into amino acid sequences. The non-redundant gene sets were built for all predicted genes using CD-HIT. The clustering parameters were 95% identity and 90% coverage. The longest gene was selected as representative sequence of each gene set. The gene set representative sequence (amino acid sequence) was annotated with NR, KEGG, COG, SWISSPROT and GO database with an e-value of 1e-5. The taxonomy of the species was obtained as a result of the corresponding taxonomy database of the NR Library.
NCBI prokaryotes genome databases
The prokaryotes genome data were acquired from the databases in the NCBI (https://www.ncbi.nlm.nih.gov/genome/browse#! /prokaryotes/). There are 266319 prokaryote genomes up to now (Chromosome (3,186), Complete (19,702), Contig(141,127), Scaffold(102,304))(Supplementary Fig. 1). There are 108,506 prokaryote genomes that are associated with human (Chromosome(1,539), Complete(8,179), Contig(57,034), Scaffold(41,754) ) (Fig. 1a).
Blast in the NCBI
We acquired the full-length 16s rRNAs from the assembled contigs. Then the 16s rRNAs were blasted in the database of the NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome), then we screened similar sequences in the order of identity, and downloaded the relative sequences. Then we identified their bacterial species of the assembled contigs and analyzed their evolutionary relationship.
ClustalW and phylogenetic tree analysis
To examine the differences between the full-length 16s rRNAs, we compared the 16s rRNAs and visualized the differences using bioedit software (Borland Software Corporation, Scotts Valley, USA). For phylogenetic tree analysis, we blasted the full-length 16s rRNAs in the NCBI database (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome), and selected the most related full-length 16s rRNAs by identity. With these 16s rRNAs, we analyzed the phylogenetic tree with mega7.
Single bacteria analysis
The single bacteria analysis included gene prediction, ncRNA prediction, repeat sequence prediction, non-redundant analysis and the common function potential analyses. We performed the gene prediction with Prokaryotic Dynamic Programming Genefinding Algorithm (prodigal(v2.6.3)); The results included gene number, average gene length(bp) and GC% (gene region). ncRNA predictions were harnessed with three softwares (tRNA(tRNAscan-SE(v1.3.1)), rRNA(RNAmmer (v1.2)), sRNA(Rfam(v10.0))). Repeat sequence prediction was analyzed with RepeatMasker(v4.0.7). The common function potential analyses were included Non-redundant (https://www.ncbi.nlm.nih.gov), Swissprot (http://www.uniprot.org), KEGG (http://www.genome.jp/kegg/pathway.html), Cluster of Orthologous Groups of proteins (https://www.ncbi.nlm.nih.gov/COG/), comprehensive antibiotic resistance database (CARD) (https://card.mcmaster.ca) and carbohydrate-Active enzymes database (http://www.cazy.org).
R programming language v. 3.4.3 was used for statistical analysis. Statistical significance between two groups was determined using an unpaired two-tailed Student’s t test. Data are presented as mean ± SD (standard deviation) or mean ± SEM (standard error of the mean) as indicated in the figure legends. P values were considered statistically significant at P < 0.05.