In this study, we predicted the diversity and the structures of the microbial community, as well as the functional groups in AS systems using ANN models. We also evaluated the importance of environmental factors in the predictions.
The use of artificial neural network (ANN) models in this study increased the predictive power of complex systems of microbial communities. When ANN models were used to predict ASVs appearing in at least 10% of the samples, 60.82% of the ASVs> 10% had a prediction accuracy R21:1 exceeding 30%. In a previous study, the multiple regression model could only explain about 15% of the variability in genus-level taxonomy of a soil bacterial microbial community(18) and only predicted the top ten taxa of that community. Compared with this previous study, our prediction accuracy was greatly improved, with prediction range being increased to all ASVs appearing in at least 10% of samples, which proves the application potential of ANN models in predicting the complex systems of microbial communities.
Using the Neutral Community Model (NCM) proposed by Sloan et al.(28), this study transformed migration from a vague qualitative concept into a number with biological meaning, the potential migration rate (m). Higher values of m indicate that a species is less limited by dispersal. The low migration rate of high-abundance taxa and high migration rate of low-abundance taxa in this study (Additional file 2: Figure S10) indicates that dispersal limitation has a significant effect on high-abundance taxa, but not on low-abundance taxa, which is consistent with findings for some ecosystems(39, 40). High-abundance taxa with a low migration rate will appear in some samples due to environmental selection(41), and their relative abundance can be well predicted using these environmental factors. However, low-abundance taxa with high migration rates usually appear in a sample when the migration occurs and the spatial heterogeneity of the sample provides them with ecological niches. Neither the randomness of migration nor the spatial heterogeneity of samples was reflected in our input environmental variables, as such, these environmental factors were less predictive of low-abundance taxa. In addition, low-abundance taxa have been reported to have higher abundance variability than high-abundance taxa(34), and prediction targets with higher variability are not conducive to the stability of the predictive model, further explaining why the predictability of the relative abundance of high-abundance taxa was significantly higher than that of low-abundance taxa.
The weight of environmental factors in the predictive model reflects the influence of environmental factors on the corresponding prediction targets to a certain extent. For example, our results showed that the most important environmental factors affecting the prediction of evenness and richness were DO and IndConInf, respectively. Evenness and richness are two critical indicators to measure the diversity of ecological communities. The former describes species differences, and the latter describes the number of species. Previous studies have demonstrated that relative abundances of some functional taxa are sensitive to changes in DO(42, 43), and the abundance of these functional bacteria reflect the differences in species abundance of the community. Therefore, DO has a high weight in predicting the evenness of microbial communities in AS systems. Industrial wastewater contains many toxic and harmful substances(44, 45), in which many microorganisms cannot survive. Therefore, industrial wastewater directly affects the population of microorganisms(46), and IndConInf plays an important role in predicting the richness of microbial communities in AS systems. The environmental factors with top weights in predictive models of nitrogen removal-related taxa ASV6 and ASV142 were AtInfTN, Nitri, and NO3N (Table S6). This correspondence between functions and environmental factors indicates that environmental factors with high weights in predicting microbial taxa may play an essential role in environmental filtering in the deterministic process of community assembly.
Important factors that cannot be identified using traditional methods may be highlighted by ANN modeling. Conventional studies on AS systems have only focused on the correlation relationship between environmental factors and microbial communities(16, 17, 47), which limited the scope of consideration for key environmental factors. For example, some previous studies had shown apparent differences between the microbial communities of industrial sewage and municipal sewage(14, 48), which showed that the IndConInf variable might impact the AS system's microbial community (49). However, despite its high importance weight in ASVs> 10% predictive models, IndConInf was not significantly associated with the microbial community structure (Table S7). By analyzing the importance weights of environmental factors in predictive models, this study illuminated variables that require further attention and that can better predict and control the microbial community of AS systems.
Although our work has made some contributions to the prediction and interpretation of the microbial community structure in AS systems, we still cannot explain the weights of some environmental factors in the predictive model due to the black-box characteristics of the ANN model. Our results show that environmental factors with low skewness and low kurtosis distribution are more likely to have higher weights in predicting the relative abundance of microbial taxa, which we cannot explain using current knowledge. Increasing the interpretability of the ANN model will help us better use this powerful predictive tool to analyze our concerns, which is also the future direction of machine learning-based big data analysis.