Metagenome Sequencing and Assembly Statistic
The total microbial DNA extracted from tempeh samples collected from two different tempeh producers in Bogor, Indonesia were subjected to Illumina whole metagenome sequencing pipelines. The average effective rate of clean reads from two raw reads of metagenomic data after the quality trimming was 99.93%. A total of 29,030,144 (36.74%) reads from 78,995,980 EMP clean reads and 17,146,676 (15.86%) reads from 108,045,092 WJB clean reads were mapped to the Rhizopus spp. genome reference. The number of contigs resulting from the co-assembly steep data was 293,961. The longest contigs were 485,167 bp, and the N50 value was 1,994.
Taxonomic and Functional Profiling
The taxonomic assignment of contigs is based on individual genes taxonomic assignment. The SqueezeMeta pipeline implements a fast-last common ancestor (LCA) algorithm to analyze each query gene hit results as the Diamond[19] search query against the GenBank nr database. The contigs are annotated to a consensus of the taxon to which most of their genes belong. The selected hits must pass a minimum amino acid identity (AAI) level for assignment to taxonomic ranks. For the phylum and genus, the threshold was 40% and 60%[20]. The metagenome reads will map onto contigs using Bowtie2[21] to estimate each gene and contig abundance. Contigs with phylum annotation were 263,096 (89.5%). Proteobacteria was a relatively abundant phylum in EMP (74.54%) and WJB (85.38 %) metagenome. In the genus level, 211,603 (72%) contig were annotated. Novosphingobium was a relatively abundant genus in the EMP metagenome (27.16%) and the Proteobacteria phylum of EMP (26,81%). Enterobacter was a relatively abundant genus in the WJB metagenome (34.39%) and the Proteobacteria phylum of WJB (33.93%). Firmicutes phylum in the EMP metagenome (10.07%) was relatively more abundant than the WJB metagenome (3.23%). Leuconostoc (3.83%), Enterococcus (2,99%), and Lactobacillus (1.98%) were the top three genera in the Firmicutes phylum of EMP. In the Firmicutes phylum of WJB, Enterococcus (1,17%) was the most abundant genus. The abundance of Bacteroidetes phylum was relatively similar in EMP (1.38%) and WJB (1.84%) metagenome samples (Figure 1). Functional profiling used the latest publicly available version of the KEGG database for KEGG ID annotation. Iron complex outer membrane recepter protein (KEGG ID: K02014) was the most transcripted expression in the metagenome from tempeh samples (Figure 2).
Binning and Bin Check
The total number of bins obtained from the co-assembly of EMP and WJB metagenome samples results from the DAS tool[22] was 25. According to the CheckM[23] result, eleven bins were categorized as good-quality bins, whose completeness was more than 75% with less than 10% contamination (Table 1). Among good-quality bins, four bins were categorized as high-quality bins, whose completeness was more than 90%.