We followed six major steps for studying the effects of climate and land-use change on food web properties (Fig. 1). Step 1, we compiled the empirical species interaction data of terrestrial vertebrates and constructed the empirical metaweb of observed trophic interactions for species worldwide. Steps 2 and 3, we constructed a deep learning model with species interactions and traits, and predicted the global metaweb with potential interactions. Steps 4 and 5, the current and future distributions of all species were mapped through species distribution models with climate variables and refined by species habitat preference and land-use changes. Step 6, the local food webs in the contemporary era and under different future scenarios were inferred and mapped by matching species distributions and interactions from the predicted metaweb. After these steps, we depicted the food-web metrics and compared their differences between current and future food webs.
Interaction data compilation and empirical metaweb
To assemble a species-level trophic metaweb of global terrestrial vertebrates, we integrated multiple sources of information about the diet of terrestrial vertebrates from online databases and peer-reviewed articles (Fig. 1, Step 1). The databases included TETRA-EU (a continental-scale, species-level trophic metaweb for European tetrapods 66), CarniDIET 69, The Avian Diet Databases 70, SquamataBase 71 and Globi 72 (Global Biotic Interactions datasets, filtered by interaction types of “eats,” “preysOn,” and “kills”). These sources were complemented by a comprehensive review of the diet of terrestrial vertebrates. For each carnivorous terrestrial vertebrate in China, we searched the major Chinese literature website (CNKI) for the species name and “diet” to find available studies and obtained species-level trophic interaction records in these studies (The full list of the Chinese articles can be found in Table S1). The compiled interaction database covered 5808 vertebrate species and 96228 links, including 2879 species from the Globi database with 5967 links, 1138 from the TETRA-EU dataset with 81196 links, 847 from the CarniDIET dataset with 2320 links, 3341 from the The Avian Diet Database and SquamataBase with 6372 links, and 230 from the Chinese literature with 373 links (Table S1). All the interaction records constitute an empirical trophic meta-food web, referred to as the “empirical metaweb.” Cannibalistic links were excluded. We realize this metaweb is far from complete in the terms of the species and trophic links in terrestrial ecosystems on earth. However, it is constructed from the best information available and provides predictions of potential interactions among vertebrates that are more detailed than other existing studies.
Species trait data
Species trait data were extracted from multiple trait databases, COMBINE 73, EltonTraits 74, amniote life-history database 75, traits of lizards 76, AmphiBIO 77 and Amphibian traits database 78. It is intractable to integrate all the trait fields from all databases considering that different databases emphasize diverse aspects of the traits. In addition, most databases only included one to two classes in terrestrial vertebrates. Therefore, we chose to use the shared fields in these databases. Seven traits covering morphology, life history and ecology were chosen, including adult mass, adult body length, maximum longevity, female maturity (the length of time needed for a female to reach sexual maturity), litter size, preferred diet and activity period (circadian paradigm, such as nocturnal, diurnal, crepuscular, cathemeral or other).
Because the completeness of the traits in each database varied among the four animal classes (see Table S2 for details), we imputed missing values in two steps. First, considering the vital importance of the body mass and body size in shaping the ecological network structure 79, we imputed the missing body mass values of amphibians through the known allometric relationship between snout to vent length (SVL) and body mass 80. Then we performed the trait imputation using the missForest algorithm 81 with phylogenetic information among species using phylogenetic eigenvector. Since species without DNA information are not fixed in their taxonomic constraints, no tree can meaningfully summarize their uncertainty 82. The consensus phylogenetic trees of Mammalia, Squamata and Aves with branch lengths were generated using 100 random credible trees from Vertlife using the R package ape 82–84. The imputation process was performed within each class. Previous studies found that this phylogenetic-based trait data imputation method is a robust way to impute trait values 85,86. The consensus tree of amphibians based on all credible sets was directly obtained from published data 87. The imputations for Crocodylia and Testudines were based on their species-level phylogenetic tree 88. We also integrated activity data for more than 700 snake species 89 and searched the diet and activity data for all Crocodylia and Testudine species with the Animal Diversity Web 90. Before the missForest imputation step, we tested phylogenetic signals of seven traits based on Abouheif’s Cmean statistic with 999 permutation tests 91,92 with R package phylosignal93. We found significant phylogenetic signals for the seven traits (p < 0.01 for all estimated traits, Table S3), which supports our approach to impute missing traits using phylogenetic information.
Species range data and habitat refinement
The geographical range data of mammals and amphibians were retrieved from IUCN Red List websites (https://www.iucnredlist.org) and bird range maps were obtained from Birdlife International (https://www.datazone.birdlife.org). The range data of reptiles were obtained from the GARD 1.7 dataset (Global Assessment of Reptile Distributions), which is collated from various sources and validated by experts 94,95. Marine vertebrates (N = 137) were not considered. Spatial areas representing nonextant status were removed in the data cleaning process. These range maps were refined to species’ suitable habitat maps by combining species’ IUCN habitat preferences (https://www.iucnredlist.org) and the global map of terrestrial habitat types developed by Jung et al.96.
Environmental data
For modeling species climatic niches, current and future bioclimate variables were obtained from the WorldClim2.1 database (https://www.worldclim.org) 97. Five bioclimatic variables: mean annual temperature (BIO1), temperature seasonality (BIO4), annual precipitation (BIO12), precipitation seasonality (coefficient of variation; BIO15) and precipitation of the driest quarter (BIO17) were selected to reduce the impact of internal variability of and reduce collinearity (Pearson correlation < 0.8, Fig. S1) among climate variables in the following species distribution models (SDMs) 98,99. For simplicity, we did not choose different climatic factors for each animal group because previous studies have shown that the main tetrapod groups have general “rules” of climatic niche evolution and similar responses to climate change 100. For future climate scenarios, we used the three global warming scenarios (Shared Socioeconomic Pathways, SSP1-2.6, SSP2-4.5 and SSP5-8.5) that depict low, intermediate and very high greenhouse gas emissions, respectively 101. For each scenario, four General Circulation Models (GCMs) including MIROC6, IPSL-CM6A-LR, EC-Earth3-Veg and CNRM-CM6-1 at a spatial resolution of 10 arc-minutes latitude-longitude (about 20 km at the equator) in climate change projections were used to calculate the median value of five bioclimatic variables. We chose three time periods in the near current (1970–2000), 2041–2060 and 2081–2100. For simplicity of expression, we use 2050 to represent 2041–2060, and 2090 to represent 2081–2100. We also obtained elevation data with a resolution of 5 arcminutes from WorldClim2.1.
We adopted a set of widely used high-resolution future land-use data based on 0.05° resolution under diverse socioeconomic and climate scenarios 102. The data in each year includes a grid-explicit fraction of each of the land types. We used the dataset under three climate scenarios (SSP1-2.6, SSP2-4.5 and SSP5-8.5) and corresponding periods (2050 and 2090) for the following analyses.
Predicting the global metaweb with deep-learning modeling
We predicted the global meta-food web by applying a deep-learning model to relate empirical species traits to interaction probability based on empirical interaction records (Fig. 1, Step 2 and 3). Several assumptions were adopted to minimize erroneous predictions of forbidden links 103. (1) Predator species could not be herbivores. (2) We kept links for predator species that were not in the range of a potential prey species because species’ ranges may change in the future 7. This means our predicted metaweb is not the simple aggregation of local food webs, but a reserve of potential links to cope with the rewiring of food webs in response to environmental changes. (3) Each observed predator-prey interaction relationship was assumed to be stable and could not convert to its opposite with predator becoming prey 12. While these assumptions were heuristic, we would like to emphasize that our goal is not to construct a food web as realistic as possible, but to elucidate general patterns of food web complexity at global scales.
Our deep learning model was a binary classification model that linked the outcome variable (whether one species eats another species) and traits of species pairs. We used the deep learning framework supplied by R package “tidymodels” and “Keras” to train a multilayer perceptron (MLP) model 104 (Fig. 1, Step 2). The predictor variables were the functional traits of species. In addition to the seven previously mentioned traits, we also considered another seven traits about physiology and taxonomy to further distinguish different taxa. These traits included the flight capability of each species, the taxonomy each species belongs to (order level, one of Mammalia, Aves, Reptilia and Amphibia), whether it is a poikilotherm (organisms with variable body temperature), and whether it is an amniote (the most recent common ancestor of extant mammals and reptiles, and all its descendants, including but not limited to extant birds, mammals and reptiles). Given the above traits, the model was trained to detect patterns that relate traits to predator-prey interaction probabilities. In addition to the imputation of missing values, we selected hyperparameters that were necessary to enhance predictive performance in the deep-learning model. We used the “tune” package to test predictive performance on a grid of hyperparameters including hidden units, dropout, and epochs. We also used the “Exponential Linear Unit” (ELU) algorithm to speed up learning in deep neural networks and get higher classification accuracies 105.
The full dataset was divided into two parts, the training dataset and the test dataset. The model performance was evaluated with area under the receiver operator characteristic curve, kappa (the accuracy of the constructed model to the accuracy of a random model) and true skill statistic (Fig. S3). In the model, the outcome variable was the predicted probability of interaction, and 0.5 was set as the cutoff to distinguish whether a predicted interaction between two species occurred or not. By summarizing all the successful predicted interactions, an ultimate global metaweb was constructed (the “predicted metaweb”) (Fig. 1, Step 3).
Inference of local food webs at current and future times
The local food webs (in 1×1 degree cells) at current and future times were inferred by matching species distributions and predicted interactions (Fig. 1), including species distribution modeling with climate variables (Step 4), distribution refinement with species habitat preference and land-use change (Step 5) and local food web construction (Step 6).
The whole species distribution modeling process follows the standard ODMAP (Overview, Data, Model, Assessment and Prediction) protocol for reporting species distribution models 106; see Box S1 for detailed information. SDMs were implemented with an ensemble forecasting framework to obtain robust forecasts 107 with four different algorithms: one regression model (GAM) and three machine learning models (Random forest, XGBoost and Maxent). These models were employed to construct the current bioclimatic envelope models and project predictions to future climate scenarios and periods (SSP1-2.6, SSP2-4.5 and SSP5-8.5 in 2050 and 2090). For each species, we generated presence points based on refined species’ habitat-suitability maps and absence points by systematically sampling outside the presence points based on 0.5°resolution 28,108. We excluded species with ten or fewer presence points because most algorithms do not perform well with small sample sizes 109. We performed this process three times to eliminate the bias and reduce uncertainty caused by single simulations. In each replicate run, we randomly chose 75% of observations to calibrate the model and the remaining 25% were saved for validation. Thus, we obtained 12 models in total for each species, based on the combination of four statistical models and three replicate runs. All models with a TSS (true skill statistic) value larger than 0.7 and a ROC value larger than 0.8 were considered as well-performing models and acceptable 28. For each species, all acceptable models were merged to generate an ensemble prediction of distribution 107,110. We generated SDMs for 18,963 species (4,773 for mammals, 9,627 for birds, 6,289 for reptiles and 4,390 for amphibians) through the “biomod2” package in R 110,111. For the remaining narrow-ranged species that were not modeled, we used the current distribution maps by assuming constant distributions over time 112.
Not all species have the same dispersal capacity. Here, we applied a group-specific dispersal distance limitation method for each of the four main taxa of terrestrial vertebrates. Following previous studies 21,28,113, we applied two dispersal scenarios (no-dispersal and max-dispersal). In the no-dispersal scenario, the future distribution of all modeled species will be limited to the predicted current distribution pixels, which is a pessimistic assumption but important for in situ conservation. In the max-dispersal scenario, species’ distributions were limited to a buffer area based on realistic dispersal distances. These buffer areas around the current species ranges were based on dispersal distances of 2000, 2000, 3000, and 4000 km, for amphibians, reptiles, mammals and birds, respectively. These dispersal distances represent a maximum dispersal distances and exclude regions and environmental conditions that are not occupiable by the species 28,108,114. Compared to birds and mammals, we adopted smaller dispersal limits for amphibians and reptiles because of their limited dispersal ability 113,115,116.
We refined the climate-based species distributions by excluding unsuitable habitat types altered by land use change. Specifically, we first mapped the IUCN Habitat Classification Scheme (v.3.1) to International Geosphere Biosphere Programme (IGBP) classes. Then, we linked plant functional types (PFTs) classification, IGBP classes and land-use data we adopted here to refine species habitat areas (See Table S4 and Table S5 for details). Following the classification scheme transformation methods 15,117 and the practice 99 in previous studies, we refined predicted species distributions of climatic niches as permissively as possible: species were allowed to exist in a pixel if the percentage of a species preferred habitat in the pixel was larger than zero.
Based on the above species distribution predictions and land use refinement under the corresponding scenarios and periods, we obtained the species checklist in each 1×1 degree cell. Following the intersection methods of previous studies 66,118,119, we retrieved the local food webs based on the predicted global meta web and the species assemblage composition in each 1×1 degree cell (Fig. 1, Step 6). Such methods have been successfully applied to comparative studies across environmental gradients 120. Next, based on species distributions, we excluded the trophic links whose species’ elevational ranges do not intersect to perform an elevation-filtering process. Specifically, we applied the previously refined species distributions to determine the elevation distribution ranges of species under each climate scenario and period, thus forming a dataset of dynamic elevation changes with the change of species distribution. Finally, more than 15,000 local food webs were generated in each time period and climate scenario. We removed the cannibalistic links and isolated nodes from the analysis. To evaluate the extent to which (and locations where) our local food webs represent local species assemblages, we defined coverage as the proportion of the species in each reconstructed local food web divided by the species number originally distributed in each cell (Fig. S2 A). Hence, coverage gives a measure of the effectiveness of food web construction in each 1×1 degree cell.
Food web structures
We calculated basic and widely-used descriptors of food webs that characterized complexity, trophic groups and network topology. Web size (S, species richness in a web), links (number of trophic links), linkage density (all realized links divided by S), connectance (all realized links divided by S2), characteristic path length (mean shortest food chain length between species pairs), number of basal, top predator and intermediate species, proportion of omnivorous species, generality (number of prey per predator species), vulnerability (number of predators per prey species), and network modularity. For modularity, we calculated standardized network metrics relative to the null expectation by calculating z-scores following the method of Kortsch et al. 121. This procedure eliminates the effect of network size on network structure and makes the structural metrics comparable along environmental gradients 33,36,37. To capture the spatio-temporal change of relative connectivity and interdependence among four taxa, we also calculated the percentage of interclass links (the trophic links whose two nodes are from different terrestrial vertebrate classes) relative to the total links according to the predefined module division based on four classes in terrestrial vertebrates (Aves, Mammalia, Amphibia and Reptilia). For number of basal, intermediate and top species, we determined each species’ identification by calculating each species’ prey-averaged trophic level (PATL) from the predicted metaweb122, a classical method to rank species based on their vertical position in food webs123,124. The top species are identified as those species in the top 5% of PATL distribution, the basal species are those with PATL equal to 1, and the rest are intermediate species. All food web metrics were calculated in R with packages “igraph” 125 and “cheddar” 126. In the context of climate and land-use change, metrics for web size, links, connectance, modularity, number of basal species and generality were selected to include in the main text of the article because they were sensitive to environmental changes in our preliminary analyses; the patterns for other metrics were listed in the supplementary materials.