This study followed standard ODMAP protocols for implementing species distribution models (SDMs) (Zurell et al. 2020), using multiple model algorithms to compare model performance and select the best performing model for projecting earthworm spatial distribution throughout Germany. Furthermore, we used several methods (expert judgement, statistical techniques, ecological-relevance analysis) to select environmental predictor variables, thus enabling us to cover a large spectrum of relevant environmental variables for robust modelling of earthworm species distribution.
Machine learning algorithms such as Random Forests (RF), Generalised Boosted regression Models (GBM) or MAXENT have been suggested to perform better than traditional regression models such as Generalized Linear regression Models (GLM) or Generalized Additive regression Models (GAM) (Elith et al. 2006; Li and Wang 2013; Valavi, 2022). Although RF has hitherto only rarely been used and its potential underutilized in SDMs, its high prediction performance has recently attracted attention in applied ecological studies (e.g., Mi et al. 2017). RF and GBM have been described as ensemble classifiers, which consist of and use several alternative trees in decision making while building model predictions (Li and Wang, 2013; Guisan et al. 2017). Although exceeding the performance of GLM and GAM in this study, GBM only predicted within the 3rd quantile range of the total-density and species-richness data. The comparison of the goodness-of-fit statistics (R², CI, AUC, and Kappa), the observed: predicted data fits (data not shown) as well as the resultant maps of predicted density and richness by all models illustrated the good performance of RF. For instance, this algorithm was able to predict beyond the 3rd quartile range of density field data, including maximum densities over 600 ind/m2. Other studies corroborate our finding of RF algorithms having the best predictive performance (e.g., Marmion et al. 2008; Mi et al. 2017; Valavi et al. 2022). We did, however, observe goodness-of-fit improvements for density predictions in RF after including additional data from Bavaria, confirming reports that RF can be data sensitive (Valavi et al. 2021; Yiu 2021). The resulting partial response curves explaining the relationships between communities (or species) and environment further exemplify how the RF models produce ecologically informative results (Cutler et al. 2007, Mi et al. 2017).
The high goodness-of-fit results for the RF models notwithstanding, any predictive model is only as good as the underlying data used for calibration. With over 20,000 data records from close to 1000 sites of occurrence, the biological background data can be considered large and highly sufficient. Although the earthworm data records also included data on the environmental predictors in over 40% of the cases, providing high thematic association, this is patchy and needed to be augmented by external data. This is critical for soil parameters, for which it is often difficult to obtain comprehensive, nationwide data that is not based on broad interpolations (inappropriate due to the high small-scale heterogeneity of soil). However, not all relevant parameters could be included. For instance, Creamer et al. (2019, in Baritz et al. 2021) regarded indicators of soil organic-matter quality (i.e., C:N, N:P relationships) to also be highly important for soil organisms, which was not available for Germany. We are also aware of the potential difficulties of using external habitat data since a temporal disconnect between earthworm observation and habitat-type overviews may contain land-use changes. Fortunately, habitat type was the most common environmental metadata included with the earthworm data, ensuring a broader 1:1 association for model calibration. Finally, only abiotic variables were considered as predictor variables; any interactions with other organisms (i.e., between earthworm species, other soil fauna or microorganisms) were not considered, since large-scale data for other organisms is also not available. Despite not being able to consider every potential driver of earthworm distribution, the models did include a high number of the most important environmental parameters known to effect earthworm fitness (e.g., Lee 1985; Edwards & Arancom 2022).
Although the model predictions have not yet been validated in the field, expert collations of earthworm species’ autecology confirm a majority of the predictions. Notable are the predicted species’ responses to soil pH, which identified many acidophobous and some acidophilus or -tolerant species, with a threshold between pH 4 and 5. Graefe & Beylich (2003) also reported a strong species-specific differentiation, with a common threshold of pH 4.2, except for i.e., the acidophobous A. longa with a threshold of pH 5, as also predicted by our models. Our predictions of species’ responses to soil acidity are also widely confirmed by, e.g., Sims & Gerard (1999), Jänsch et al. (2013), Krück (2018), and Sherlock (2018). These authors as well as Römbke et al. (2018) and Lehmitz et al. (2016) also described species-specific preferences for soil organic-matter (SOM) content, which were almost completely confirmed by the model predictions. Some of these authors also considered preferences for clay content, which were generally but not always confirmed by the current model predictions. For instance, Jänsch et al. (2013) reported on D. octaedra’s preference for soils with low clay content and A. cholorotica’s slight preference for clay soils, both of which were contradicted by our results. Also, our predicted positive response of L. terrestris to soils with lower clay and silt content is contrary to the assessment of Sims & Gerard (1999) and Sherlock (2018) that this species prefers clay-rich soils (these authors, however, regard UK populations).
Our study supported earlier work on the effects of precipitation and soil moisture, where these variables accounted for population increases and the distribution of adult earthworms (Lavelle, 1978; Lavelle and Spain, 2005; Kalu et al. 2015; Rajwar et al. 2022). Philips et al. (2019) via simpler statistical methods identified climate (average annual precipitation and temperature) as the almost exclusive driver of earthworm communities (total density, species richness) at a global scale. While our study confirmed the combined role of temperature, precipitation and soil moisture, it also identified, i.e., habitat type, soil pH and soil organic matter as important drivers. Since at global scales climate also drives, e.g., natural vegetation and partly also soil genesis, it is plausible that statistical methods will therefore identify climate over other environmental parameters as drivers at global scales. Our study at a more regional scale, however, also demonstrated the importance of habitat and soil parameters as additional drivers of earthworm distribution. Therefore, the relevant drivers of earthworm biodiversity are apparently scale dependent; climate parameters being important at global and regional scales, while habitat and soil factors become more important at smaller spatial scales. At the most local scales, these later factors may be most important and anthropogenic land-use measures will also increasingly influence earthworm biodiversity.
This study predicted occurrence probabilities beyond the traditional forest, grassland and arable land by encompassing all terrestrial EUNIS level-1 habitat types. The predictions highly corresponded to the suggested range-size classifications. For instance, most of the “large-range” species were predicted both to be broadly distributed throughout many regions in Germany and to occur equally in many diverse habitat types, often with probabilities > 50–60%, thus indicating their ecologically generalist nature. The Red List of Germany lists these species as being “very common” (Lehmitz et al. 2016), and they have been reported to occur in many different habitat types (e.g., Sims & Gerard, 1999; Jänsch et al. 2013; Römbke et al. 2018; Sherlock 2018), confirming our results. Although L. terrestris is generally viewed as being eurytopic, it has sometimes been noted as having a slight preference for grassland sites (Sims & Gerard 1999; Jänsch et al. 2013; Sherlock 2018), which is confirmed by our predictions that, however, also equally predicted forest habitats. This species has been said to be disturbance intolerant (Lehmitz et al. 2016; Römbke et al. 2018), which may explain its low predicted probabilities in naturally disturbed sites (i.e., floodplains, bogs) as well as anthropogenically influenced habitat types.
The species we classified as “mid-range” were also predicted to occur widely in Germany, albeit in much lower probabilities. The German Red List lists them all as being “common”. Although predicted to occur in many different habitat types, they appear to be more habitat discriminant, with preference optima in specific habitat types. For instance, A. chlorotica was predicted to occur more strongly in agrarian sites (arable fields or grassland), which has been reported from observational data (i.e., Jänsch et al. 2013; Römbke et al. 2018). On the other hand, D. octaedra was predicted to occur mostly in forest habitats, as also noted by, e.g., Jänsch et al. (2013), Römbke et al. (2018), Sherlock (2018). Considering its acidophilus nature, its preference is likely for coniferous forests (cf. Sherlock, 2018). While A. castaneus seems to be more generalist in nature, we predicted its highest occurrence probabilities to be in forests and floodplains, which is contradicted by, i.e., Jänsch et al. (2013), Römbke et al. (2018), and Krück (2018), who consider its preference to also be for grasslands. Interestingly, A. longa was predicted by our models to also be somewhat generalist, occurring in different habitat types, but to be missing in wetter habitats (e.g., islands, coastal, floodplains, bogs). This is confirmed by Krück (2018), who attested a preference for dryer habitats, but contradicted by Sims & Gerard (1999), who noted its occurrence in floodplains of the UK.
The species identified as “restricted-range” all showed higher occurrence probabilities limited to specific regions and habitat types. The Red List of Germany lists them all as being “rare” or “very rare”. The highest distribution probabilities of D. attemsi were more in hilly or mountainous regions of Germany; its highest probabilities were predicted for arable land (and secondarily in forests), which has not been noted by previous authors (except Sherlock 2018). A. eiseni was predicted by the models to most likely occur in forests (of central and southern Germany), as also documented by Römbke et al. (2018) and Lehmitz et al. (2016). L. limicola is known to be hydrophilous (Sims & Gerard 1999; Lehmitz et al. 2016; Krück 2018; Römbke et al., 2018; Sherlock 2018). Accordingly, the models predicted it to occur in floodplains with high probabilities, as well as in grasslands and forests (which may also be located in floodplains or similar, but possibly misclassified to more general habitat types in the data). The models predicted it to occur mostly in western Germany (and most strongly along the Rhine River valley), confirming Krück (2018) who noted its very rare occurrence in northeastern Germany. L. badensis is an endemic species of Germany, occurring in forests of the High Black Forest (southwestern Germany) (Lehmitz et al. 2016), as confirmed by the model predictions.
Interesting are the few species predicted to occur in “fringe” habitats. For instance, L. limicola, L. rubellus and L. castaneus were predicted to occur in marine-influenced habitats (i.e., islands), and A. caliginosa, A. chlorotica and L. limicola in coastal sites; all however with low (< 35%) occurrence probabilities, indicating patchy occurrence in these habitats. Notable were the large number of species predicted in moderate probabilities to occur in urban, industrial, and other anthropogenic sites. Although the occurrence probabilities in these habitat types were often low, these results represent the first of their kind and can help assessment of soil-biodiversity surveys in such areas.
At the community level, the inverse relationship between species richness and total density (as found primarily for north-eastern Germany) is a common occurrence in ecology, where an area may exhibit high individual densities, but low species richness (Verberk et al. 2011). Less favorable environmental conditions may only allow the occurrence of few species, but these in large populations, perhaps due to reduced competition from other species (Groves 2022). This could possibly explain the high individual density, but low species richness predicted for north-eastern Germany, which is known for dryer, sandy soils and where forests are usually coniferous plantations. In this regard, the predicted high occurrence probabilities of D. octaedra and L. rubellus in these areas are conspicuous, both of which are epigeic acidophilous (or –tolerant) species with a predicted affinity for forests. Personal observations have often shown high population densities of very few epigeic species in forests on sandy soils, rendering this explanation plausible. On the other hand, the Bavarian Alps and Rhine valley were predicted to be among the few regions with high earthworm biodiversity (both total density and species richness) in Germany. The Rhine Valley is known for rich soils and high general biodiversity and the predictions in alpine regions support the Alpine convention declaration of the Alps being one of the richest regions in Europe in terms of plant and animal diversity (Alpine convention, 2014).
Range size has long been recognized as being a good indicator for assessing a species’ threat status. This study did not intend to create a red list, as this already exists for earthworms in Germany (Lehmitz et al. 2016). However, mapping earthworm species’ spatial distribution and determining their geographic range size helped categorize species into range-size groups, which enables a rapid assessment of species’ threat status and their conservation needs and provides valuable information for setting conservation priorities (IUCN 2012a,b; 2022). Within Germany, no species was considered threatened under extent-of-occurrence (EOO) criteria (all studied species’ EEOs exceeded the minimum threshold of 20,000 km2; IUCN 2012, a, b). Nonetheless, a comparison of the predicted distribution maps and the calculated area of occupancy (AOO) shows that certain species should be of concern due to their restricted AOO range in Germany, i.e., A. eiseni, D. attemsi and A. limicola or due to being endemic in Germany, such as L. badensis.
Divergent opinions exist on the status of A. eiseni in Germany; while Bouché (1972) and Graff (1953) classified it as rare in France and Germany, respectively, this was contradicted by Römbke et al. (2017) who considered the species to be common. Our findings tend to partly support the earlier opinions of Graff (1953) and Bouché (1972) and the intermediary position of Lehmitz et al. (2016), who judged it to be moderately common (we prefer the term “restricted range “). While our predictions confirmed the restricted occurrence of A. eiseni in Hessian forests (Römbke et al. 2017) and a few other clusters, caution must be exercised. The species is assumed to be arboreal and corticolous, and the limited observational data may be methodologically biased, as common earthworm extraction methods may not sufficiently sample this species’ preferred microhabitat (Lehmitz et al. 2016; Römbke et al. 2018). A. limicola is the only species studied here that is listed as endangered in the German Red List of earthworms. Its predicted occurrence in wetland sites, corroborating its status as a hydrophilous species, as well as its predicted restricted distribution in Germany, supports its endangered status. The remaining species should be viewed as species of focus in future soil- biodiversity surveys.
Especially noted is L. badensis, an endemic species found in the high Black Forest region, and which is likely endangered (Lehmitz et al. 2016). Although insufficient observational data was available for calculating EOO or AOO, it was predicted to have a very restricted and narrow distributional range, corroborating its assessment as endangered. This species also highlights an important aspect of species distribution modelling: although the models predict potential occurrence e.g., in the Bavarian Alps, the species has never been found to occur there. The model results primarily show the high potential habitat suitability for the species in the Alps, and are not proof of occurrence, thus underscoring its status as endemic to southwest Germany.