The use of ecological knowledge and monitoring data for stream modelling greatly improves the assessment of habitats and the understanding of the relationship between environmental variables and the occurrence of certain organisms (Huong et al., 2009). Numerous studies have attempted to develop reliable ANN methods to assess the status and health of ecosystems (Liu et al., 2023).
This work focuses on macroinvertebrates in large rivers. However, the general aspects are very similar for other freshwater communities in terms of modelling approaches. The input variables differ depending on the community but typically include geographical and seasonal parameters, habitat quality parameters (flow velocity, vegetation) as well as physico-chemical properties (dissolved oxygen, water temperature, pH, nutrient concentrations, COD), and toxicity characteristics (Graf et al., 2015).
Pollutants significantly influence primary production in aquatic ecosystems, diminishing the survival chances of organisms. Based on the obtained results, a variety of taxa, such as species belonging to Oligochaeta (Annelida), Chironomidae (Insecta: Diptera), and Mollusca, were present in occurrences of different pollutants.
Organic compounds, including pesticides and PAHs, can lead to structural modifications in macroinvertebrate communities in aquatic ecosystems (Szöcs et al., 2012). Pesticides have many sources and are considered pollutants with the potential to reduce the relative abundance as well as the number of sensitive taxa in macroinvertebrate communities even at minimal toxicity levels (Orlinskiy et al., 2015; Schriever et al., 2007; Thiere & Schulz, 2004). Including more variables that are less easy to use in a model is necessary because many variables, such as heavy metals and other potential toxicants, are often not included, while other variables are redundant and could be removed. PAHs as organic chemicals with known adverse effects, including oxidative stress and carcinogenicity, require standardised methods for environmental risk assessment. As robust bioindicators of PAHs, benthic macroinvertebrates play a crucial role, emphasising the need for a consistent approach to contamination exposures and biomarker interpretation(Onyena et al., 2023) Bromacil is a herbicide commonly used in agriculture, and Pisidium species, which belong to the subclass Bivalvia, are known to be relatively tolerant of this chemical. However, it is important to remember that any herbicide, even at tolerated levels, can have an impact on aquatic ecosystems and non-target organisms. The exceptional accuracy of ANN modelling (MSE = 1.25-10-2) suggests that these species could be potential bioindicators identifying presence of the certain herbicides. A given pollutant can have a wide range of effects, ranging from population decline to species extinction. Although pesticides do not directly indicate their presence, they can have neurotoxic effects on mussel species, which are recognized using biomarkers of pesticide exposure (Solé et al., 2018).
In the entire Danube, the concentration of Bentazone was in the range of 0–0.008 µg /l, with a concentration of 0.004 µg/l. Even though pesticides are primarily a major stress factor for macroinvertebrate populations, the modelling results showed that some macroinvertebrate species exhibit resistance to the effects of Bentazone in aquatic ecosystems. Certain macroinvertebrate taxa show a positive correlation with the presence of pesticides in river courses (Palma et al., 2018). However, macroinvertebrate taxa that have been observed in polluted streams can also occur in unpolluted streams (Neumann et al., 2003). This phenomenon confirms the need for further investigation of the chemical status of habitats where macroinvertebrates have been identified with proven positive indicators for the presence of pesticides. Through this research modelling Bivalvia Corbicula fluminea and Chironomidae Tanytarsus sp. are identified as species with a high correlation to Bentazone (Popović et al. 2019). Due to its rapid response to environmental changes, Corbicula fluminea has been proposed as a good bioindicator for environmental quality assessment (Vranković & Slavić, 2015). The AI-based model created, using the presence of these two species as inputs, and the concentrations of Bentazone as outputs, showed exceptional accuracy. The models achieved remarkably low MSE values, with the MSE value of the LSTM neural network being 0.16-10-5, emphasizing its reliability and precision in predicting Bentazone concentrations.
In this study, no differences in MSE values were observed after the inclusion of Oligochaeta species in the ANN model. Similarly, Popović et al., (2019) show that Oligochaeta is not recognized as tolerant of Bentazone, while Kuzmanovic et al., (2017) found that more oligochaetes (Lumbriculus sp., Enchytraeidae, Limnodrilus sp.) and chironomids (Nanocladius sp., Stictochironomus sp., and Microspectra sp.) were found in the pesticide polluted areas of the rivers where fine sediments predominated. The high abundance of Tanytarsus species possibly indicates coarser sediment with sand compared to the other sites. This finding leads us to conclude that the inclusion of sediment types in future studies could provide more accurate results.
Linkage to 2,4-Dinitrophenol (2,4-DNP) was observed in species from six different taxa groups, namely Bivalvia, Gastropoda, Oligochaeta, Crustacea, Chironomidae and Odonata. The observed MSE of 0.0001 for the ANN model, based on the mentioned input/output combination, demonstrates the exceptional accuracy of the model in predicting specific xenobiotics at a sampling site where these macroinvertebrates were detected. 2,4-DNP is a phenolic compound that is used both as a wood preservative and as a pesticide and poses significant risks to freshwater organisms (Mayhew & Stephenson, 1999). Therefore, there is a need for further research and the development of comprehensive guidelines to ensure the well-being of aquatic ecosystems and the organisms that live in them. Guidance values to protect aquatic organisms from the harmful effects of this compound are extremely limited (Kwak et al., 2020)
The neural network model Transformer, which was created using species that were positively correlated to Fluoranthene as input data, and the corresponding measured concentrations of this xenobiotic as output, has an exceptionally low MSE value of 5.02 x 10− 5. The MSE value demonstrates the exceptional accuracy and precision of the model in predicting fluoranthene concentrations using input data descended from the tolerant species. In the case of 4LP modelling, when the model inputs are restricted to FS-selected species of the Bivalvia subclass only, the MSE value of the 4LP model increases significantly from 4 x 10− 4 to 2.87 x 10− 5. This significant increase in the MSE value indicates that these bivalve species are the most suitable bioindicators for Fluoranthene, as their response to the xenobiotic leads to higher prediction errors compared to other species in the model (Zuloaga et al., n.d.)When analyzing the occurrence of mollusc species, allochthonous species such as Corbicula, Dreissena, Sinanodonta, and Physella acuta are known to have higher contaminant tolerance than autochthonous native species such as Pisidium, Sphaerium, Unio and Theodoxus. This fact should therefore be considered when interpreting the results (Dillon et al., 2002).
Based on the findings from the literature review, neural networks used in the modelling of macroinvertebrate communities in rivers predominantly use feedforward connections, which in certain cases are supplemented by self-organizing maps (Goethals et al., 2006).
In the field of environmental engineering LSTM networks are employed to predict dissolved oxygen levels using chemical parameters as input (Heddam et al., 2022). To date, there are no published articles investigating the use of LSTM in modelling the relationships between biological and chemical parameters. The vision transformer model in neural networks offers an innovative method for modeling biological parameters. The vision transformer model within neural networks offers an innovative method for modeling biological parameters. Notably, there are unpublished articles that use a similar combination of inputs and outputs as used in this research. A recent publication describing the application of transformer-based neural networks in the modelling of macroinvertebrate communities focuses specifically on the identification of families within the Ephemeroptera family of Japanese mayflies. This type of modeling is primarily related to computer vision and differs from the approach we follow in our research. Therefore, a direct comparison of the results is not possible (https://ecoevorxiv.org/repository/view/6695/).