Sequencing quality analysis
A total of 3,667,351 valid tags sequences were generated from 60 samples, and the average number of high-quality sequences obtained per sample was 61,132. The average length of valid Tags sequences of all samples was between 406-421bp, with an average of 414 bp. Sample sequencing depth was between 99% and 100%, and all samples were fully sequenced.
The number of OTUs
The number of common OTUs between the patients of IBS-D and HCs was 813. Meanwhile, the number of proper OTUs in the patients of IBS-D was 734, and that in HCs was 575. Details can be found in Supplementary Material Table S2.
Table 1 Characteristics and OTUs of two groups
Group
|
IBS-D
|
HCs
|
P-value
|
Number
|
30
|
30
|
/
|
Gender
|
Male
|
13
|
15
|
0.293
|
Female
|
17
|
15
|
Age
|
40.3±14.7
|
40.9±14.4
|
0.874
|
BMI
|
19.97±5.58
|
21.39±3.83
|
0.014
|
Staple food
|
Rice
|
26
|
27
|
1.000
|
Wheat
|
4
|
3
|
OTUs
|
1547
|
1388
|
0.001
|
Characterization of fecal microbiota
There is significant difference in gut microbiota composition between IBS-D patients and HCs, details can be found in Supplementary Material Table S3, S4.The phylum level was taken as an example to show a histogram of relative abundance of species. As shown in Figure 1, we found that compared with HCs, Firmicutes (P<0.05), Fusobacteria (P<0.01), Actinobacteria (P<0.01) decreased significantly, and Proteobacteria increased significantly (P<0.01) in IBS-D patients.
The genus level was taken as an example to show the heat map of species abundance clustering. The results were shown in Figure 2 that compared with HCs, Enterobacteriaceae significantly increased (P<0.01), and Alloprevotella (P<0.01), Fusobacterium (P<0.01) significantly decreased in IBS-D patients.
In order to further study the phylogenetic relationship of genus level species, the representative sequences of top100 genus were obtained by multi-sequence alignment and shown in Figure 3.
Statistical analysis was performed to find species with significant differences between the two groups. Using T-test, the phylum level was taken as an example to show the difference of species between the two groups (Figure 4). The result showed that the species that differed significantly between the two groups were Firmicutes and Proteobacteria.
In order to emphasize the statistical significance and biological correlation, we performed LEfSe analysis on two groups of bacteria. Using LEfSe, we can identify the characteristics and related categories of different abundance. As shown in Figure 5, the biomarker of HCs included Ebterobacteriales, Gammaproteobacteria, Proteobacteria, and those of IBS-D included Clostridiales, Clostridia, Firmicutes.
Alpha diversity
Alpha diversity index under 97% consistency threshold for different samples (shannon, simpson, chao1, ACE, goods_coverage, PD_whole_tree) was used for statistics (table 2, data quantity selected during homogenization, cutoff = 48286). There is a significant difference in community diversity between IBS-D and HCs (P<0.05). The result indicated that community diversity of IBS-D is lower than that of HCs. There is no significant difference in community richness and sequencing depth between IBS-D and HCs. The results of difference analysis between groups with Alpha diversity index were shown in Figure 6.
Table 2 Comparison of Alpha diversity index between IBS-D and HCs
Group
|
species
|
shannon
|
simpson
|
chao1
|
ACE
|
goods_coverage
|
PD_whole_tree
|
IBS-D
|
311
|
4.708
|
0.890
|
357.980
|
374.214
|
0.998
|
44.200
|
HCs
|
330
|
5.084
|
0.921
|
372.200
|
385.747
|
0.999
|
42.731
|
P-value
|
0.191
|
0.011
|
0.191
|
0.374
|
0.602
|
0.445
|
0.343
|
Beta diversity
PCoA is used to describe the sample distance. We performed PCoA analysis based on weighted unifrac distance and unweighted unifrac distance (Figure 7). The closer the sample distance is, the more similar the species composition. Details can be found in Supplementary Material 4.
PCA can extract two coordinate axes that reflect the differences between samples to the greatest extent, so as to reflect the differences between multidimensional data on the two-dimensional coordinate diagram, and then reveal the simple rules in the background of complex data. The more similar the community composition of the samples, the closer they were in the PCA (Figure 8). In order to overcome the shortcomings of linear models (including PCA and PCoA) and better reflect the nonlinear structure of ecological data, we also conducted NMDS analysis (Figure 8).
In order to study the similarity between different samples, we constructed a cluster tree of samples by cluster analysis of samples. Weighted unifrac distance matrix and unweighted unifrac distance matrix were used for UPGMA clustering analysis, and the clustering results were integrated with the relative abundance of species of each sample at the phylum level, as shown in figure 9.
PICRUSt analysis
Taking the level 1 as an example, according to the database annotation results, we select the functional information of the top 10 in the maximum abundance of each sample or group, and generate a histogram of the relative abundance of functions, so as to visually view the functions and their proportions with the high relative abundance of each sample. As shown in Figure 10, the functional genes of the two groups were mainly involved in metabolism, genetic information processing and environmental information processing. According to the functional annotation and abundance information of the samples in the database, the top 35 functions of the abundance and their abundance information in each sample were selected to draw a heat map, and clustering was carried out from the functional difference level. In IBS-D patients, the function expression of gut microbiota was up-regulated in metabolism of cofactors and vitamins, xenobiotics biodegradation and metabolism, and down-regulated in environmental adaptation, cell growth and death, metabolism of other amino acids (Figure 11). Details can be found in Supplementary Material Table S5, S6, S7, S8.