Trogloregions: delimiting subterranean faunistic identities in Afrotropics and Neotropics


 Studies regarding macroecological patterns involving subterranean fauna are challenging, since the environments where such species are found generally do not have the same ecological patterns observed on the surface, due to their isolation. Therefore, using data on the occurrence of invertebrate families, we tested the influence of ecological regions already established for surface environments (biogeographic domains, biomes, and ecoregions), lithology, and drainage basins as potential drivers of the similarity on the invertebrates communities. We observed that within the surface ecoregions there might still be subdivisions due to different drainage basins, a pattern that was repeated in both aquatic and terrestrial fauna. Thus, we present different bioregions in the Afrotropics and Neotropics, in which caves have distinct faunistic identities. We discuss the biogeographic relationships between the epigean and hypogean environments that may be behind these patterns. We believe that these results can assist in conservation strategies, in which these different compositions are considered.


Introduction
The structuring of terrestrial ecosystems is determined by several biogeographic factors that de ne the diversity patterns of organisms 1 . The processes that rule these patterns are scale-dependent and hierarchical, ranging from biotic interactions at local scales to variations in environmental conditions at large scales 2,3 . In general, the terrestrial biota responds in a very similar way to the pressures imposed by environmental conditions 4 . Therefore, on a global scale, there is a huge variety of ecosystems shaped by historical changes on the environment, consequently presenting different biological identities 5 . The idea of representing these biological identities is old and over time it has been supported by different theories that tried to explain such patterns 6 . Previous classi cation systems were based on different attributes, such as temperature, precipitation, and vegetation structure 7,8 . Currently, however, these systems primarily delimit regions with geographically distinct groups of species, communities, environmental conditions, and a unique biogeographic history 5 . For this, species richness, endemism, and shared taxonomic groups, such as genera and families are generally considered 5,9 . For terrestrial environments, the most currently used classi cation system corresponds to the ecoregions proposed by Olson et al. (2001) 5  Establishing this type of bioregional classi cation for subterranean environments represents an enormous challenge, since their ecological dynamics do not follow the same rules that structure surface environments, due to their relatively stable and highly selective conditions 11 . Furthermore, there are several other obstacles related to the lack of accurate information about the taxonomy of numerous groups, their evolutionary patterns, their actual distributions, and, consequently, the community composition as a whole 12,13 . Despite the several limitations, investigations about this subject have shown promising results. For example, a pioneering work carried out in Germany by Stein et al. (2012) 14 , showed that the compositional patterns of strictly aquatic cave fauna can differ signi cantly from the bioregional patterns observed on the surface. Poulson & White (1969) 15 theorized the general causes responsible for the differentiated diversity patterns observed for the subterranean fauna, guiding studies until today. According to these authors, in order of importance, these patterns would be guided by the time available between the dispersion, isolation and speciation; by the spatial heterogeneity and resource availability in the occupied environment and, nally, by the interactions between the organisms that compose the community. Cave communities would, therefore, represent a balance between colonization and extinction determined by variations in environmental conditions and dispersion, like an island 15,16 .
Since then, one of the most interesting patterns observed and described for cave communities is that they comprise large ecotones between epigean and hypogean environments 17,18 . Thus, caves would be the most easily accessible places of a large intercommunicated empty spaces network of the most diverse dimensions, presenting themselves as peripheral zones of much larger organism distributions 18 . Due to a balance between environmental stability and resource availability, this transition is more easily observed in the regions close to the cave entrances, which support high biodiversity that mixes components from the epigean and hypogean fauna 17 . The fact is that it is common for the taxa occurring along this interface between the surface and the underground to be phylogenetically related 15 . Furthermore, a substantial gradient of specializations and different niche occupations are observed in many taxa, evidenced by adaptations and behaviors according to the conditions prevailing in the occupied space [19][20][21] . The caveat is due to those relict organisms from very old distributions that had their surface-related groups extinct by some stochastic event 15 . Besides, in subterranean communities there is a positive relationship between local and regional taxa richness 11 , similar to the patterns observed in surface communities 22 , indicating that they are unsaturated with species 11 . It is discussed in the literature that this unsaturation indicates that subterranean communities may be in uenced by the regional diversity, being somehow interconnected by species´ dispersion 11 . The de nitive colonization of subterranean environments, however, occurs by those pre-adapted organisms, capable of establishing themselves under total darkness and oligotrophic conditions, among other limiting characteristics 23 . With all these assumptions, the established consensus is that subterranean communities are phylogenetically related sub-samples from the regional species pool present in epigean environments 24 .
However, in most cases, the tests of hypotheses related to the ecological patterns of subterranean communities have involved invertebrate fauna restricted to these environments (troglobites) [25][26][27] , strictly aquatic fauna (stygobites) 14,28,29 , or speci c taxonomic groups 30,31 . This leaves a huge gap that has not been extensively explored yet, regarding the search for more general patterns of subterranean fauna, composed mainly of organisms not restricted to subterranean environments 15 . From the conservation perspective, the historical focus given to obligatory species and whose distributions are very restricted may completely obscure the fact that the caves are a large reservoir of biodiversity for invertebrates in general 32 . Thus, mostpart of the subterranean communities, even when composed of not restricted species, is fundamental for the maintenance of the trophic chain dynamics, a fact commonly neglected 11 . In this context, biogeographic classi cations foccused on subterranean environments that consider their whole communities can be crucial for conservation issues, so that different regions with different biological attributes can be targeted, as recommended for terrestrial external ecoregions 33 .
The main objectives of our work focused, therefore, on the search for eventual compositional identities of the subterranean invertebrate communities in caves from the Afrotropical and Neotropical regions and to indicate the factors responsible for them. These regions, in particular, have a related and unique biogeographic history, since they were connected and underwent similar paleoclimatic and geological changes that shaped their current landscape con guration as well as their faunal distribution 34,35 . To achieve these objectives, we used data on the presence and absence of invertebrate families recorded in caves in these two regions. We have explored a set of ve factors, that allowed us to verify the possible in uences of the surface regional pool of biological communities (Tested with the delimitation of the biogeographical domains, biomes, and ecoregions 33 ) and possible in uences of factors related to the faunal dispersion (Tested with the delimitation of drainage basins 36 ), respecting the different cave lithologies 37 . These ve factors were explored both separately and nested, hierarchically encompassing the caves from more extensive to more spatially restricted scales. We considered two hypotheses a priori: H1) Possible large-scale compositional patterns could prove to be insigni cant as each factor is considered in the analyzes using data from the occurrence of invertebrate families. The geographic distributions of the families, which are generally broad 1,38 , could result in a low variability of the data and, consequently, in the non-signi cance of each factor. Therefore, an effective bioregional classi cation with these data would prove unfeasible; H2) Even with very simpli ed communities due to their highly restrictive and selective conditions 11 , on a continental scale the compositional patterns of cave fauna could still vary markedly, re ecting the in uence of different factors that characterize the surface environments features. Consequently, the establishment of a bioregional classi cation based on these patterns could be e cient, even when using data on the occurrence of families.
Considering the results found and the assumptions presented in the discussion, we propose a new method of bioregional classi cation based on the similarity patterns of cave invertebrate communities, which de nes areas of faunistic identity, here called Trogloregions.

Results
We observed 455 invertebrates families in the 234 caves considered in our dataset. On average, the number of families per cave was 26.05 (SD=+/-13.85). More than half of the registered families belong to only ve taxa: "Acari", Coleoptera, Araneae, Diptera, and Hemiptera (Fig. 1a). Among the 455 registered families, ten occur in more than 100 caves, indicating a wide distribution (Fig. 1b). From these ten families, four belong to Araneae (Pholcidae, Theridiidae, Ctenidae, and Sicariidae). The other six are Phalangopsidae (Orthoptera), Formicidae (Hymenoptera), Tineidae (Lepidoptera), Staphylinidae (Coleoptera), Reduviidae (Hemiptera) and Psychodidae (Diptera). Pholcidae was the family with the highest number of records, occurring in 182 of the 234 caves. Approximately half of the families were registered in only one (135), two (63), or three caves (35).

Factor testing
We observed that in the biological community as a whole, all factors signi cantly represented the variations in the fauna compositional similarity (biogeographic region, lithology, biome, ecoregion, drainage basin, nested combination: PERMANOVA p < 0.001, Table 1; nMDS Fig. 2). However, it was the nested combination of these factors that best represented the similarity patterns found, consequently providing us with the most robust spatial aggregations of caves (ANOSIM global R = 0.732). Thus, initially, 37 sets of caves were formed according to the different levels of the factor corresponding to the nested combination. In each formed set, the caves belong to a given drainage basin, an ecoregion, a biome, a lithology, and a biogeographic domain, respecting the spatial hierarchy of their levels. Considering that spatially this combination corresponds to the smallest possible aggregations of caves according to the methodology employed, this means that within the ecoregions there may still be subdivisions in the compositional patterns due to the occurrence of different drainage basins. The dependence of these subsets in relation to which ecoregions they occur was revealed when the drainage basins were tested separately, showing less explanatory power (PERMANOVA p < 0.001; ANOSIM R = 0.502). This hierarchical dependence was also evident when the ecoregions were tested separately, also showing comparatively inferior results (PERMANOVA p < 0.001; ANOSIM R = 0.638).
Given the very low correlation between the dissimilarity of cave fauna and geographical distance (DistLM p < 0.01, R² = 0.088), we consider the in uence of spatial autocorrelation on similarity patterns observed at the family level to be negligible.
When we exclude singletons from the dataset and extend these analyses to the subsets of strictly terrestrial families and families with aquatic habits, we observe the same patterns indicated in the analyses with the complete database (supplementary material S1, supplementary tables 1, 2 and 3). The results of these different approaches highlight the robustness of our observations, reinforcing that the nested combination of factors best represented the similarity patterns of the subterranean invertebrate fauna, regardless their habits.

Trogloregion delimitation
After testing each factors´ signi cance and explanatory power and the consequent selection of the factor corresponding to the nested combination, the next step was to make pairwise comparisons in each level of the nested combination factor. The nested combination of these factors, from the one with the highest number of levels to the one with the lowest number of levels (From "Drainage basin", with 26 levels, to "biogeographic domain", with two levels) resulted in the distribution of the 234 caves in 37 groups. Thus, after the invertebrate fauna was attributed to the 37 cave groups corresponding to these levels, the comparisons between these groups grouped them into 17 "supergroups" (SIMPROF p < 0.01), as shown in the Fig. 3 CLUSTER. In other words, we found different regions in which caves present distinct faunal compositions considering their invertebrate families. The same 17 supergroups were obtained when we re-analyzed the SIMPROF with p < 0.05, using the Jaccard index instead of the Sorensen-Dice index, with p < 0.01 and p < 0.05. Again, the maintenance of the same patterns in these different approaches reinforces the consistency of the results.
We adjusted the delimitation of certain supergroups resulting from the SIMPROF analysis due to the large geographical distance or spatial overlap of supergroups due only to the different lithologies (see methods section). Therefore, we have de ned a total of 18 Trogloregions, which names are based on remarkable regional characteristics. They are Kalahari, Kumasi, Acra, Lake Volta, Cape Town, Areia Branca, Upper Ribeira, Atlantic Highlands, Lower Rio Pardo, Upper Paraná, Espírito Santo, Eastern Amazon, Petén-Veracruz, Etosha, Lower São Francisco, Central Brazil, Lower Paranaíba e Apodi-Mossoró (Fig. 5). The Trogloregions' borders correspond to the cutouts of the overlays of drainage basins with the ecoregions, in a nested way. It is important to note that naturally, ecoregions are already nested within biomes and biogeographic domains, following their borders.
Finally, the ANOSIM testing the 37 sets of caves (resulting from the nested combination of factors) using the supergroups indicated by CLUSTER and SIMPROF as the grouping factor, which served as the basis for the delimitation of the Trogloregions, obtained an R = 0.923 (PERMANOVA p < 0.001), a value much higher than that of all other factors tested in our initial explorations (Fig. 4, Table 2).

Discussion
The results corroborated our second hypothesis. We veri ed that although the caves present only a ltered portion of the biodiversity observed in the surface ecosystems, along with continental scales the compositional patterns of the subterranean communities can vary dramatically. As a result, caves from different regions may have singular faunistic identities. According to the results, this occurs due to different attributes of the external environment, such as the bioregional delimitation of epigean ecosystems with their potential colonizers (here tested with the delimitation of ecoregions, biomes and biogeographic domains) and according to divergent drainage ows (here tested with the delimitation of the drainage basins). Essentially, all the approaches showed that within the ecoregions there may still be subdivisions in the compositional patterns due to the different drainage basins, respecting the different lithologies. In addition to subterranean communities as a whole, these patterns were also evident in both strictly terrestrial communities and related or strictly aquatic organisms.
Such results, however, need to be interpreted with caution, especially regarding their biogeographical context. The rst point to be noted is the large spatial discontinuities between the caves covered by this study, which is inevitable because regions that are prone to the occurrence of caves are not homogeneously spatially distributed 39 . Besides, the subterranean biodiversity of tropical regions is still little known or underestimated when compared to certain locations in the northern hemisphere 40 . The second point concerns the use of family occurrence data. Naturally, families have older origins and their occurrences re ect more generalized spatially patterns and less restricted than genera or species 1,38 . Still, family occurrence data can show enough variability and be e cient in showing variations in ecological patterns, being a viable alternative for quick access to biodiversity [41][42][43] . In this work, this prediction was corroborated. A positive consequence of using family data is that their occurrences may re ect different biogeographic or paleoclimatic scenarios that have shaped their current distributions, connecting different regions that were related in the past and currently house them 1,38 .
Althuogh subterranean communities may be determined by regional and local processes as in any other external environment, these communities are generally very simpli ed due to the highly limiting conditions, and the relative environment stability, factors that end up reducing the genetic variability of the populations 11,15 . Environmental stability is a key point, as subterranean communities can be directly affected by environmental uctuations in both the external and subterranean environments 30,31,44 . Apparently, environmental stability is one of the main "attractors" for populations that may come and colonize these environments, ceasing the selection of characteristics that can be advantageous under the great unpredictability of the surface environment 15 . Considering the cave environment, organisms can, for example, adapt their distributions according to uctuations in temperature or relative humidity, which are much less pronounced than in the external environments 44,45 . Regarding the environmental stability in the external habitats, climatic oscillations on the surface can extinguish epigean populations, while hypogean populations can be spared, resulting in their isolation 30,31 . Therefore, it is argued that the maintenance of greater subterranean biodiversity in certain regions may be the result of constant precipitation rates in the long term, favoring a greater resource availability to the hypogean environment 27 . It has also been found that changes in temperature on the surface are one of the main factors responsible for variations in the similarity patterns of subterranean fauna, especially for nonspecialized fauna, providing more similar communities in more climatically similar regions 46 . Thus, it is very likely that the patterns veri ed with the delimitation of the Trogloregions were partly shaped by the environmental transformations and characteristics of the surface.
To support this hypothesis, there is evidence that certain groups of arthropods (Araneae, Sicariidae: Sicarius and Hexophthalma 47 , and Psocoptera, Prionoglarididae: Afrotrogla, Neotrogla, and Sensitibilla 48 ) that are also found in caves, are restricted to the aridest parts of Afro and Neotropical regions. Their distributions are con ned to regions of greater aridity since the vicariance event, when the distribution of their ancestors diverged 47,48 . According to our database, the occurrence of the genera from these two families, for example, occurs largely in the caves of the "Lower San Francisco" and "Etosha" Trogloregions. It is interesting to note that the structure of the communities of these Trogloregions is highly similar, as shown by the cluster analysis (Fig. 3, groups 16 and 26, respectively). This is most likely due to the fact that these regions have similar environmental conditions, varying from arid to semi-arid 49 .
For the genera of Prionoglarididae, the fact that they evolved under semiarid climate and oligotrophic conditions is even related to the coevolution of some of their adaptations 50 . For the Sicariidae genera, this is related to the maintenance of their ecological niche 47 . Certainly, this comparative exercise can also be carried out between caves of other Trogloregions, provided that phylogenetic and biogeographic studies of the groups are taken into account, as in the example of the families previously mentioned.
In this same line of thought, the idea that in geological time, different regions of the planet were marked by changes in their environmental conditions and biogeographical histories that affected the structuring of their ecosystems is broadly accepted 51 . These were factors taken into account for the delimitation of the surface ecoregions, besides, of course, of their distinct ecological communities 5 . In our results, we found that the borders of the Trogloregions also followed, in part, the borders of the ecoregions. As discussed above, we believe that this is at least partially due to the presence of potential colonizers that the different ecoregions can provide to the caves occurring within their boundaries, especially considering that the whole community was analyzed, not just the troglobitic/stygobitic organisms. Although these relationships are indirect, they indicate that the features of surface ecosystems can predict similar patterns of subterranean fauna in different regions. Knowing how and which environmental conditions were determinant for these patterns are questions that deserve further research.
Another fundamental factor observed in our analyses concerns the different drainage basins nested in the ecoregions. Hydrology is a central component for the maintenance of all the subterranean dynamics, being determinat for the resource availability, where the water ows can carry particulate or dissolved organic matter 52,53 also determinig patterns of faunistic distribution 15,29,54 . Caves placed in larger drainage basins tend to have greater faunal diversity than caves located in smaller drainages 15 .
Furthermore, the distribution of subterranean fauna tends to be greater in large drainages due to the greater chances of dispersion 15 . In our work, we found that in some situations, within the same ecoregion there may be groups of caves with dissimilar faunistic compositions because they occur in different drainage basins. Most likely, this may be the result of divergent drainage ows on large geographic scales, guiding the distribution of certain invertebrate groups to the caves downstream from the boundaries of their respective drainages. A very interesting example that strengthens this idea comes from a study carried out on the border between Italy and Slovenia, using two species of aquatic isopods of the genus Asellus 29 as a model. In that region, molecular data showed that the species distributions coincided with the delimitation between the of the Reka and Pivka river drainages 29 . Another example comes from a study carried out in Germany, where the similarity of aquatic fauna between different regions proved to be dependent on the relief con gurations, with connections across large valleys (Rhine River Valley) and between old Pleistocene basins 14 .
It is important to note that the examples mentioned above apply strictly to aquatic cave fauna. However, in addition to it, we also veri ed the dependence of drainages on the similarity patterns of strictly terrestrial fauna. Invertebrate fauna, in general, can be transported through the interstices in the rock matrix, between interconnected cave systems along drainages and from the surface to the underground, being carried involuntarily by the natural water ow and oods 23,55 . After oods, for example, the structure of the cave community can be altered and the diversity can increase momentarily because of organisms from the epigean fauna being carried by the water ow, coming from the external environment 55 . In karst landscapes, however, on local scales the boundaries of surface drainages not always coincide with the boundaries of subterranean drainages 56 . In addition to the water in ltration being quite accentuated in these landscapes due to its high porosity, subsurface ows may differ from the surface runoff patterns due to the arrangement of aquifers, which may or may not be con ned within the geological barriers 57 . On the other hand, on a regional scale, information about surface drainage is still essential, as it may show zones of allogeneic recharge of subterranean aquifers, where the water ow may come from other non-karstic reliefs 58 . In this sense, our analyses showed that some groups of caves in adjacent basins were grouped according to the similarity of their fauna. As discussed, a possibility for this occurrence would be the possible dispersion of fauna between them, due to subsurface drainage ows that are not aligned with surface ows. However, this is another point that needs to be more adequately explored in the future on more re ned spatial scales for a more precise diagnosis.
As shown, caves from different regions can have completely different communities and this certainly needs to be taken into account when preservation strategies are devised. Considering the simplicity of the trophic dynamics, characteristic of subterranean environments 11 , the different faunal compositions among Trogloregions imply that the trophic functions in these environments are performed by different taxa and guilds into each of them. This represents a new range of hypotheses to be explored and considered in other projects, reinforcing the potential that caves and their biological communities offer for ecological, biogeographic, and evolutionary studies 24 . On the other hand, all this potential contrasts with the unbridled exploitation of the natural resources on the landscapes related to the caves, which locally can represent the complete destabilization of this admittedly fragile and little known ecosystems 12 . This is probably the biggest practical implication of our results: the indication that the local loss of subterranean communities can be irreparable and without equivalents on a regional and even continental scale.
As an example, in Brazil, environmental defense mechanisms recommend that enterprises that cause environmental impact should present compensatory measures to mitigate them (Law Number 9985, 2000 59 ). However, in the legislation that regulates the exploitation of karst landscapes, there is no mention of compensatory measures that respect the faunistic identities that different regions may present (Decree No. 6640, 2008 60 ). Another alarming gap is the absence of any mention of measures aimed at maintaining hydrological dynamics in subterranean environments and karst landscapes, which contrasts with the fact that these landscapes are an important natural freshwater reservoir 61 . This gap needs to be lled as our nds show that drainage basins are one of the central pillars of large-scale subterranean fauna compositional patterns.
In the context of the subterranean environment conservation, a question inevitably arises: if caves in a certain area have been destroyed, caves from which other regions or geological units nearby can potentially present fauna with similar composition and that may be targeted for preservation actions? To solve this type of problem, methodologies similar to those we used for the delimitation of Trogloregions can represent an important guide for possible decision making, in any region of the planet that are replicated. We encourage the scienti c community to replicate this methodology and improve it where possible so that the understanding of the macroecological patterns of subterranean communities can be tested, interpreted and disseminated in a more feasible way to assist conservation actions. The bioregional delimitation of surface ecosystems has already proved to be fundamental to verify which ecoregions on the planet need greater attention for preservation actions 33 . For subterranean environments, we hope to have taken some steps that can help in this task.

Study area
For the survey of biotic and abiotic data, we considered 234 caves in the Afrotropical and Neotropical regions (supplementary material S2), which are found in ve countries: South Africa, Brazil, Guatemala, Ghana, and Namibia. These caves are concentrated between 17 ° N and 26 ° S latitudes (supplementary material S1, supplementary Having met all the criteria, we used the data on the occurrence of invertebrate families of these caves. The use of family-level was due to two main reasons. First, due to its older origins and distributions generally broader than genera and species, enabling comparisons between areas on a wide geographic scale 1,38 . Second, we assumed that the sampling and group identi cation biases have been reduced and homogenized with these measures. Therefore, further taxonomic re nements were avoided. Such re nements have a greater probability of identi cation errors and uncertainties regarding the distribution of species or genera. These problems have been reported as a major obstacle to studies in invertebrate ecology 32,68 . The nomenclature of families included in the construction of the database was checked in the literature to verify their current taxonomic situation, where we were able to correct possible synonyms or reclassi cations. In this case, just for the construction of the graphs in gure 1, 57 families were assigned to the taxon "Acari", which is a generic term used to designate six orders of arthropods whose phylogeny is still not well resolved 69 .

Abiotic data
The abiotic data encompassed information about the surface ecological regions, lithologies, and drainage basins that overlapped the locations of the caves. The delimitations of the ecological regions were obtained from shape les with the biogeographical domains, biomes, and ecoregions, on the Ecoregions2017 33 platform. The delimitations of the drainage basins were obtained from shape les on the HydroSHEDS 36 platform, where the drainage borders are provided in les with the name "bas" for each continent 70 . These delimitations indicate divergent drainage ows in the landscape, according to water ow modeling from radar images from the Shuttle Radar Topography Mission (SRTM) 36 . For contextualization, the de nition of the drainage basin we used was that of "a landscape in which the surface waters converge to a single location, such as a point in a stream or river, or a single wetland, lake or other body of water" 71 .
The locations of the 234 caves were surveyed in the articles considered and into the CEBS database.
These locations were then projected in decimal degrees and using the wgs84 datum to meet the speci cations of the previously mentioned shape les. From the locations, the values for each shape le were extracted with the Point Sampling Tool from Qgis 3.8.3 software 72 , and later used in the other steps. The caves, according to information gathered from the CEBS database and from the literature used, belong to carbonate, siliciclastic and granitoid lithologies (Dataset S1).

Statistical procedures
Before testing our hypotheses, we rst veri ed a possible effect of spatial autocorrelation on the similarity patterns of cave fauna, where considerable positive effects are recurrent in this eld of study due to the restricted patterns of subterranean fauna dispersion 13 . For this, we used Distance-based Linear Models (DistLM) to calculate a simple regression 73 between the similarity matrix, obtained with the Sørensen-Dice index, and the geographic distance matrix, obtained with the georeferenced locations of the caves. For these cases, these multivariate models are treated as a more robust alternative for the Mantel test 73 , which is often used to test such in uences 74 . In the regression obtained with the DistLM, the criterion used for the calculation was R² 73 , with the X and Y coordinates treated as a "geographical distance" indicator and being considered the predictor variable.

Factor testing
The analysis of the factors was carried out systematically, respecting their spatial hierarchy, so that we could lter all possible signi cant variations within the groups of caves according to the test of each factor.
To form spatial aggregations of caves based on the factors that best represent variations in compositional similarity, we explored ve different factors, both separately and combined, with their different number of levels in parentheses (infographic in gure 6): biogeographic domain (2), lithology (3), biome (5), ecoregion (16), and drainage basin (26). We have selected these ve factors due to their reported in uence on cave communities, where in some ways subterranean communities can be in uenced by cave lithology 37 ; by the regional pool of surface species 24 , an in uence that was tested here using the boundaries of the biogeographic domains, biomes, and ecoregions; and nally, by the drainage ows in the landscape, which in uence had been only reported acting on strictly aquatic cave organisms until then 29,54 .
The similarity between cave fauna was obtained using the Sørensen-Dice index, which is less subject to loss of sensitivity in highly heterogeneous datasets 75 . In addition to being heterogeneous, we found in our data a large asymmetry in the experimental design, where there was a variation in the number of levels of a nested factor within the higher-level factor. For these reasons, we tested signi cant differences in the faunistic composition of the caves grouped by each factor with the Permutational Multivariate Analysis of Variance (PERMANOVA) in two different designs. PERMANOVA was chosen because it allows testing of the effect of factors by obtaining p-values in highly unbalanced experimental designs, enabling more robust interpretations for these cases 73,76 .
In the rst design, p values were obtained for each factor separately after 9999 permutations with the The nested combination of these factors, from the one with the highest number of levels to the one with the lowest number of levels (From "Drainage basin", with 26 levels, to "biogeographic domain", with two levels) resulted in the distribution of the 234 caves in 37 groups. Therefore, the caves of each group belong to a combination: drainage basin (Dr) "A", ecoregion (Ec) "B", biome (Bi) "C", lithology (Li) "D" and biogeographic domain (Do) "E" (see the schematic model in gure 7). Despite the low in uence shown by the lithology factor in the similarity patterns (PERMANOVA p <0.001, ANOSIM R = 0.063), we opted to maintain this factor in the combination nested with the others tested factors. This standardization of cave groups regarding their lithology is important, as this is a factor that may have a more evident in uence on taxonomic categories lower than the level of invertebrate families 37 .
With the signi cance assessed, the global R from the Analysis of Similarity (ANOSIM) was used to verify the quality of the cave groups formed according to the factors 14 , where the higher the global R, the greater the difference between the levels of the factor 73 . As with PERMANOVA, here we tested the factors separated and combined hierarchically. For ANOSIM, 999 permutations were used.
In addition to being applied to the whole biotic dataset, the aforementioned analyses were applied separately only to groups of strictly terrestrial organisms, to aquatic groups (here, we consider as aquatic groups those families with species that need or live associated with water bodies by least at some stage of its life cycle) and excluding singletons from the data. We decided to show only results of the complete dataset in the main text due to its greater robustness since we have found results that showed the same patterns in the other approaches (supplementary material S1, supplementary tables 1, 2 and 3). For these reasons, we carried out the steps described below only for the complete biotic dataset.
For this set, we graphically represented the dissimilarities between the faunistic composition of the caves with the Non-metric Multidimensional Scaling (nMDS), indicating the factors separated and combined hierarchically.

Cave groups delimitation
According to the ANOSIM results, the use of the combined factors was selected due to the higher Global R-value. The presence/absence of family data was then computed at the 37 levels resulting from the nested combination of factors, with each level representing a distinct group of caves. These cave groups served as a base spatial unit for the similarity test between the different regions covered, which was carried out through Cluster analysis (CLUSTER) along with Similarity Pro le analysis (SIMPROF), both based on the Sørensen-Dice similarity index. The CLUSTER was obtained using the average linkage clustering method, condensing the samples from the closest neighbors by pairwise comparisons 77 . The SIMPROF analysis added to the CLUSTER is a robust alternative, based on permutations, for the condensation of groups or samples that a priori have an unknown structure based on similarity 77 . With 999 permutations and testing a signi cance level of 1% (results in the main text) and 5% (supplementary material S1, supplementary gure 2), respectively, SIMPROF indicated the formation of 17 "supergroups" in which caves have statistically distinct faunistic compositions, thus establishing large regional groups of invertebrate families. Based on these 17 supergroups, we carried out the nal Trogloregion delimitation.
Finally, with the 37 initial groups resulting from the nested combination of factors, we calculated another PERMANOVA (9999 permutations, Type III Partial and Unrestricted Permutation of Raw Data criteria) and ANOSIM (999 permutations) using the 17 supergroups indicated with CLUSTER and SIMPROF as the clustering factor. The dissimilarity patterns between these groups were plotted with an nMDS.
To obtain a counterproof from the supergroups used to delimit the Trogloregions, we calculated another CLUSTER and SIMPROF with 1 and 5% signi cance levels, this time using the Jaccard index (supplementary material S1, supplementary gures 3 and 4). The Jaccard index maintains the same weight for shared and unique biological groups, resulting in comparatively lower similarity values 75 . However, these new tests showed us the same supergroups, unchanged. Thus, we have decided to keep the results obtained with the Sørensen-Dice index with p <1%, which presented more robust values, in the main text.
All the above-mentioned analyzes were performed in the PRIMER 6 software with its PERMANOVA + expansion 73 .

Map construction
Maps delimiting trogloregions were constructed respecting the hierarchical sequence of the ve factors used in the other analyzes. Therefore, the initial cutouts (37, according to PERMANOVA and ANOSIM) consisted of the intersections resulting from the vector layers of the ecoregions with the overlapping drainage basins (the ecoregions are already naturally nested in the biomes and biogeographical domains). These outlines were obtained with the Intersection vector tool 72 . Discontinuities in vector layers without caves were disregarded, and only features with caves were selected. For this, the Multiparts to Single Parts vector tool was used 72 .
From the outlines, it was possible to carry out the selection and combination of the 37 groups of caves indicated with PERMANOVA and ANOSIM in the 17 supergroups indicated with the CLUSTER and SIMPROF analyses. For the construction of the maps, we split two of the 17 supergroups for having combined caves from different ecoregions and basins which borders are separated by a great geographical distance. Therefore, we split the supergroup that combined caves from the Cerrado ecoregion with caves from the Bahia coastal forests ecoregion (formerly the "i" supergroup, now the "Lower Rio Pardo" and "Upper Paraná" Trogloregions). The other splitted supergroup had combined caves in the Caatinga ecoregion, in the Neotropical region, with caves in the Angolan mopane woodlands ecoregion, in the Afrotropical region (formerly "n" supergroup, now "Kalahari" and "Lower São Francisco" Trogloregions). Four "supergroups" in the CLUSTER/SIMPROF (A, C, F, and K) have only one cave and are considered outliers 78 , they have very different compositions. This did not allow their combination with the other supergroups according to the analyses. Of these four, the supergroup "k" was combined with "l" because they overlap spatially for sharing attributes (except the lithology, supergroup k has a limestone lithology, while l has siliciclastic lithology), thus forming the "Eastern Amazon" Trogloregion. After these adjustments, the maps indicate 18 Trogloregions, which were named according to some striking geographic feature in their locations.
The shape le resulting from the maps construction and Trogloregions delimitation contains all information inherent to the factors combined in its attribute table are available as supplementary les   (supplementary material S3, and supplementary material S1, supplementary table 4), aiming at its application in future studies and sharing with the scienti c community.
The steps described for the construction of the map were performed using Qgis 3.8.3 72  The authors declare no competing interests.   Trogloregions was based on these 17 "supergroups". Results with a p<5% cut with the Sørensen-Dice Index and results using the Jaccard index with a p<1% and 5% cuts remained unchanged (See supplementary material S1, supplementary gures 2, 3 and 4).   Infographic outlining the analyses performed according to our main questions. From the dataset of invertebrate families occurrence in 234 caves. In (a), we tested the factors separately and combined hierarchically according to their number of levels. Here, these levels are exempli ed as "pieces" in a puzzle. After establishing which factor best represented the variation in the biotic data, in (b) these data were assigned to each piece, so that we could test their combination according to their compositional similarity, assembling the Trogloregion maps afterward. All analyses presented were based on data on the presence/absence of invertebrate families and using the Sørensen-Dice index. The protocols and data treatment inherent to the performance of each analysis are described in detail in the methods section.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. s1supplementarymaterialtrogloregionsalvarengaetal2020.docx s2bioticabioticfactorstrogloregionsalvarengaetal2020.xlsx s3trogloregionsshape lealvarengaetal2020.rar