Change of NGS raw data during bioinformatics pipeline
Total reads for all MCs were at least 100,000 each, ranging from 123,255 (MC2) to 184,711 (MC5) (Table 2). After pre-filterimg, total reads were between 84,090 (MC4) and 100,000 (MC1, 3, and 5). All 5 MCs showed sufficient valid reads of more than 70,000. The percent of identified reads for MC2 and MC4 were 98.9% and 98.0%, respectively. The identification rates of MC1 (67.4%), MC3 (87.1%), and MC5 (89.6%) were relatively low. However, this was caused by the inclusion of a strain of Sphingobacterium, a suspected new species. When we treat the result of this strain (Sphingobacterium_uc) as correct, the final identification rates of all MCs were more than 98%. There are differences in the identified reads among 5 MCs although each MC contains the same CFUs of bacteria. The rates of misidentified reads ranged from 0.9% in MC3 to 3.6% in MC4.
The number of OTUs among the 5 MCs was between 65 (MC4) and 126 (MC5), and the number of species was 48 (MC1)−89 (MC5) (Table 2, Figure 1A). On average, 1.27 (MC2) to 2.10 (MC1 OTUs were matched to one species.
The ratio of the number of detected species to that of expected species was between 2.8 (MC4 and MC5) and 5.3 (MC1). The ratio of those at the genus level was between 1.8 (MC5) and 4.0 (MC1). These ratios were slightly decreased when an MC contained a large number of species.
Calculation of α-diversity by OTU
The OTU, ACE, Chao1, and Jackknife indices reflecting species richness showed a moderate correlation (r = 0.56−0.62) with the number of expected MC species (Table 2). For species evenness, the correlation between the Shannon index (range: 1.85−2.65) and the number of expected MC species was high (r = 0.82).
There were few changes in the number of OTUs if the valid reads exceeded 60,000 in all MCs (Figure 1A). The OTU ranks with relative abundance (log) > 0.001 were 16 (MC1), 18 (MC2 and MC3), 26 (MC4), and 38 (MC5) (Table 2, Figure 1B).
Relative abundance at the species level
There were significant differences in the relative abundance, although all expected MC species were identified to the species level (Figures 2 and 3). At the phylum, the relative abundance was highest in Bacteroidetes and also high in Fusobacteria within each MC. The relative abundance of Fusobacterium was highest (26.42%) in MC2 because MC2 does not contain Bacterioidetes. In this study, Bacteroidetes compromise Bacteroides fragilis (6.8−7.8), Chryseobacterium gleum group (4.4−4.8), and Sphingobacterium uc (2.0−2.8) The. Morganella morganii group (1.6−1.7) and Acinetobacter baumannii (1.0−2.49) showed high fold error except in Bacterioidetes and Fusobacterium.
The proportion of species with fold error between 0.5 and 1.5 was 33% (3 of 9), 50% (6 of 12), 13% (2 of 16), 28% (5 of 18), and 13% (4 of 32) among MC1, MC2, MC3, MC4, and MC5, respectively. Staphylococcus aureus (MC1), B. cepacia (MC2), C. difficile (MC3), E. cloacae (MC4), and P. aeruginosa (MC4 and MC5) had the lowest fold error in each MC (Figure 2). The cut-off value of the relative abundance that can identify all species among MCs was the highest in MC1 (1.94%) and lowest in MC5 (0.01%) (Table 2).
A total of 422 species with 30 included in MCs were identified in the negative control (Figure 2). The relative abundance of the 30 species of MCs ranges from a minimum of 0.001% (Enterococcus faecalis) to a maximum of 0.58% (Enterobacteriaceae group).
Misidentified results at the species level of 5 MCs
We defined the misidentified results in the MCs when the final identification results showed a different species in the same genus or a different genus from a specific species. A different species in the same genus was confirmed in most species, and most of them were of the Enterobacteriaceae group, Enterococcus, and Streptococcus among MCs (Figure 4). However, the relative abundance of these was low: between 0.46% (MC3) and 1.11% (MC5). Corynebacterium striatum, Clostridioides difficile, Clostridium perfringens, Aeromonas caviae, Hemophilus influenzae, and Stenotrophomonas maltophilia were identified at the species level. Pantoea, Erwinia, Cronobacter, Cosenzaea, and Raoultella were identified although they were not included in the MCs (Table 3).
Difference of relative abundance by bacterial characteristics
The fold error was significantly lower in gram-positive bacteria (Figure 3). There was no difference of fold error by 16S rRNA copy number or genome size. Bacteria with GC contents of 60% to 70% showed significantly lower fold error.