2.1 Data preprocessing and quality control: The total DNA of the sequenced samples was tested. The concentration and purity of 49 samples (ED: 26, HD: 23) met the sequencing requirements of Illumina MiSeq platform.
2.2 Analysis of bacterial diversity in fecal samples
2.2.2 Analysis of bacterial flora composition of samples
According to OTU annotation results, histogram of relative abundance of species was made for each sample at different classification levels, which can visually display the bacterial flora of each sample at different classification levels. Figure 1-2 shows the stacking histogram of bacteria with relative abundance greater than 1% in each sample at the classification level of genera. The abscess is the sample name, and the ordinate is the relative abundance ratio of corresponding bacteria. "F_ *" means that it cannot be annotated to genus in biological classification, but can be annotated to family or other classification level.
There was no difference between the genera of the ED-HD group, indicating that there was no statistically significant difference between the high abundance bacteria (top10) and the core bacteria (90%) of the ED-HD group. However, the total intergroup flora(>1%) between groups, Alloprevotella was statistically significant(Alloprevotella was identified only in the HD group.)
2.2.3 Alpha diversity analysis
Alpha Diversity refers to the microbial community Diversity in a specific region or ecological environment, which is a comprehensive index reflecting the richness and uniformity. The Alpha Diversity analysis of a single sample can reflect the richness and diversity of the microbial community in the sample.Figure 3 shows that the ED group has a relatively low Shannon diversity coefficient.(HD group =5.741,ED group =4.982), indicating that ED group had lower bacterial diversity (P=0.00074X<0.01).
2.2.4 PCoA analysis based on Bate diversity
Principal component Analysis (PCoA, Principal Co-ordinates Analysis) is used to study the similarity or heterogeneity of sample community composition. The closer the distance the sample, showed that the higher the similarity community structure, otherwise the structure indicates that the larger the difference of the community. Figure 4 is the PCoA analysis conducted by calculating the Unweighted Unifrac distance using the relationship between OTUs systems. The x-coordinate represents one principal component, the y-coordinate represents another principal component, and the percentage represents the contribution value of the principal component to the difference of samples. Each point in the figure represents a sample, and samples of the same group are represented by the same color. The results showed that the ED group (blue) and the HD group (red) were clustered respectively, and the two groups were obviously separated, indicating that the structure of the ED-HD group was different.
2.3 Comparison of relative abundance between groups of all strains
Through statistical analysis, the genera with significant difference in the abundance change between groups can be specifically identified, and the enrichment of the genera with significant difference between groups can be obtained. Meanwhile, the size of the difference between groups and the difference between groups can be compared to determine whether the community structure difference between groups is statistically significant.
2.3.1 t - test
T-test was used to detect significant differences between groups at the level of genera (P<0.05). Results (Table 1 and FIG. 5-6) show six species were significantly different between groups, Streptococcus and Subdoligranulum were increased in the ED group than HD group, while in platts bacteria genera, Blautia slaughter's species, Lachnospiraceae NK4A136 group and Roseburia were decreased than the HD group.
Table 1 The species were significantly different between ED and HD groups
Note:*:P<0.05; **: P<0.01
2.3.2 LEfSe
In order to find the specific bacterial genera of ED, LEfSe analysis was used to compare the different bacterial genera between the two groups. The results (Figure 7) showed that the absolute value of LDA Score of 24 bacterial genera was above 2. It is known that LachnospiraceaeNK4A136 group and Prevotella_9 have the greatest influence on the difference between groups.
Comments:
1) LDA is a supervised dimension reduction method, while PCA is an unsupervised dimension reduction method
2) LDA dimension reduction can be reduced to the dimension of category number K-1 at most, while PCA does not.
3) In addition to dimension reduction, LDA can also be used for classification.
4) LDA selects the projection direction with the best classification performance, while PCA selects the direction with the greatest variance for the projection of sample points.
The main advantages of LDA algorithm are as follows:
1) Prior knowledge of categories can be used in dimension reduction, while unsupervised learning such as PCA cannot use prior knowledge of categories.
2) LDA is better than PCA when the sample classification information depends on mean value instead of variance.
The main disadvantages of LDA algorithm are as follows:
1) LDA is not suitable for dimensionality reduction of non-Gaussian distribution samples, and PCA also has this problem.
2) LDA dimension reduction can be reduced to the dimension of category number K-1 at most. If the dimension reduction is greater than K-1, LDA cannot be used.There are some evolutionary versions of the LDA that can circumvent the problem.
3) When the sample classification information depends on variance instead of mean value, LDA has a poor effect of dimension reduction.
4) LDA may overfit the data.