Soil sampling
Soil samples were collected between July and September 2017 from 114 locations in agricultural fields that had been under cultivation for at least 10 years. Maize and rice were selected as representative crops for dryland and wetland soils, respectively, and because they are globally important crops and are cultivated extensively across China. In each location, sampling sites were selected from two adjacent maize and rice fields less than 5 km apart, yielding 114 and 114 soil samples associated with maize and rice cultivation, respectively (see Fig. 1 in supplemental material). The sampling sites extended from 18.30°N to 48.35°N and 87.61°E to 99.91°E across eastern China, covering tropical wet and dry climates, warm temperate climates, warm summer continental climates, and hot summer continental climates from south to north. Three 100-m2 plots were sampled at each site, and five soil cores obtained per plot at a depth of 0–15 cm were combined. Standard testing methods were adopted for the measurement of soil pH, cation exchange capacity (CEC), total organic C, and percentage of clay, as previously described[45-47].
Soil biodiversity analysis
In total, we observed approximately 60,894 taxa in the 228 agricultural fields. The diversity of soil archaea, bacteria, fungi, protists, and invertebrates were analyzed using high-throughput sequencing, by targeting the 16S rRNA region in archaea, the 16S rRNA region in bacteria, the internal transcribed spacer 1 region for fungi, and the 18S rRNA region in protists and invertebrates. Corresponding polymerase chain reaction assays were performed using the Arch519F/Arch915R, 515F/907R, ITS5-1737F/ITS2-2043R, TAReuk454FWD1/TAReukREV3 primer pairs, respectively[48, 49], and sequencing was performed on an Illumina HiSeq2500 platform (Illumina Inc., San Diego, CA, USA). The acquired sequences were filtered for quality control as previously described [50, 51]. Any chimeric sequences were removed using the USEARCH tool based on the UCHIME algorithm [52]. Subsequently, the sequences were split into operational taxonomic units (OTUs) at a 3% dissimilarity level using the UPARSE pipeline [52]. OTUs with fewer than two sequences were removed, and their representative sequences were assigned to taxonomic lineages using the RDP classifier within the SILVA database (release 128) for bacteria and archaea, UNITE+INSD (UNITE and the International Nucleotide Sequence Databases) for fungi, and PR2 (Protist Ribosomal Reference Database) for protists and invertebrates.
Before we calculated the soil organism diversity, the OTU tables were resampled to a minimum number of sequences from each sample, at 36,880 for archaea, 27,712 for bacteria, 30,369 for fungi, 5393 for protists, and 207 for invertebrates, which showed even sampling depth within each belowground group of organisms (Additional file 1: Fig. 2). Protists were defined as all eukaryotic taxa, excluding fungi, invertebrates (Metazoa), and vascular plants (Streptophyta), according previous studies[2, 35]. Here, we used richness (that is, number of soil phylotypes) as a metric of soil biodiversity, which is the most extensively used, as well as the simplest, metric of biodiversity [2]. We use the term soil biodiversity to refer to different types of richness in general terms. Overall, we calculated the richness of the most prevalent prokaryotic and eukaryotic organisms in our soil samples, including archaea, bacteria, mycorrhizal and saprophytic fungi, protists, and invertebrates. The identities of saprophytic and mycorrhizal fungi were determined using FUNguild [53]. We used only highly probable and probable guilds with an identified single trophic mode for the analysis.
To obtain a multidiversity index, we combined soil biodiversity characteristics by averaging the standardized scores of richness of archaea, bacteria, mycorrhizal and saprophytic fungi, protists and invertebrates. The scores standardized based on a common scale ranging from 0 to 1 were calculated according to the following formula: STD= (X−Xmin)/(Xmax−Xmin); where STD is the standardized variable and X, Xmin, and Xmax are the target variable, its minimum value, and its maximum value across all samples, respectively. The multidiversity index generally accepted and has been used extensively in the current biodiversity–function literature[9, 14, 30, 54, 55]. Particularly, the richness of each belowground group of organisms was highly correlated with their Shannon diversity (Pearson r=0.75–0.94, P<0.001, Additional file 1: Fig. 3). The multidiversity calculated based on Shannon diversity was highly correlated with that calculated based on richness (Pearson r=0.67, P<0.001), and also showed positive associations with average multifunctionality (Additional file 1: Fig. 3). This trend was also observed when considering phylogenetic diversity (Additional file 1: Fig. 4). In addition, the soil biodiversity indices included in the averaging index were not highly multicollinear (r<0.8). The results of the analyses suggest that the choice of diversity metric would not alter our results.
Ecosystem functioning measures
Fifteen ecosystem functions regulated by soil organisms under a broad range of ecosystem services were included in the present study: nutrient provisioning (extracellular enzyme activities related to sugar degradation [β-glucosidase and saccharase], chitin degradation [N-acetylglucosaminidase], P mineralization [phosphatase], soil dissolved organic C, N, and P availability, microbial C and N stocks), element cycling (soil Sulfur [S], iron [Fe], copper [Cu], zinc [Zn], and manganese [Mn] availability, involved in bio-electron transfer and energy exchange in the epigeosphere), and resistance to plant pathogens (reduced relative abundance of fungal plant pathogens in soils). Although we have included 15 ecosystem functions, some important functions are inevitably unmeasured, such as process rates including nitrification rates, denitrification rates, N mineralization, and decomposition rates, and future studies are encouraged to include more essential functions for comprehensive understanding of ecosystem functioning.
All of these are state variables rather than rates of changes.
In all the soil samples, the extracellular enzyme activities were measured using fluorometry as described previously [56]. The soil dissolved organic C, available N, available phosphorus, microbial biomass C, microbial biomass N, available S, available Fe, available Cu, available Mn, and available Zn were all measured using standard soil testing procedures[45-47]. The relative abundance of potential fungal plant pathogens in soils was obtained from the sequencing analyses and was inferred by parsing the soil phylotypes using FUNguild [53]. Similarly, we applied only highly probable and probable guilds with an identified single trophic mode for the analysis. The inverse abundance (reduced relative abundance) of potential fungal plant pathogens was obtained by calculating the inverse of the variable (total relative abundance of fungal plant pathogens×−1). Because the values for ecosystem functions varied widely, we standardized all variables to a common scale ranging from 0 to 1 (as described above).
To derive a quantitative multifunctionality index for each sample, we used three independent multifunctionality approaches[57]: (1) multiple single functions[2], (2) averaging multifunctionality index[9], and (3) principal coordinate multifunctionality index[58]. To obtain an averaging ecosystem multifunctionality index, we averaged the standardized scores (a common scale ranging from 0 to 1) of all individual ecosystem functions. We also used principal coordinate analysis to examine the different dimensions of multifunctionality[58]. Note that we do not argue that one is better or more appropriate than the other.
Statistical analyses
Linking soil biodiversity to multifunctionality. We conducted ordinary least squares linear regressions between soil multidiversity (standardized averaged of the diversity of soil organisms) and single soil organisms with multifunctionality and multidimensional functioning (axes of a principal coordinate analysis including 15 functions), and performed Spearman correlation analyses between the diversity of single soil organisms and single functions. In addition, linear regression relationships between the diversity of major phylotypes were estimated in different groups and ecosystem functions. We selected the dominant phylotypes of different groups, accounting for >85% sequences in total. The dominant phylotypes were: archaeal (Thaumarchaeota and Euryarchaeota), bacterial (Proteobacteria, Acidobacteria, Chloroflexi, Actinobacteria, Bacteroidetes, Firmicutes, Gemmatimonadetes, Planctomycetes, and Nitrospirae), mycorrhizal fungi (Ascomycota, Basidiomycota and Glomeromycota), saprotrophic fungi (Ascomycota, Basidiomycota, Chytridiomycota and Zygomycota), protists (Cercozoa, Ciliophora, Chlorophyta, Ochrophyta, Lobosa and Stramenopiles), invertebrates (Nematoda, Arthropoda, Rotifera, Annelida, Gastrotricha, and Platyhelminthes) and a Nematoda functional group (Herbivore, Bacterivore, Animal-Preditor, Fungivore, Animal-Parasite and Omnivore).
Structural equation model and random forest analyses. We used structural equation modelling (SEM) to evaluate the direct link between soil biodiversity and multifunctionality (averaging), and between complexity of soil food webs and strength of BEF (explained below), after accounting for multiple drivers such as climate (mean annual temperature) and soil properties (soil pH, total C, and percentage of clay) simultaneously (Additional file 1: Fig. 9 and 13, priori models). We obtained climatic data including mean annual temperature (MAT) for all sampling sites from the Worldclim database (www.worldclim.org). Here, we did not consider mean annual precipitation since the source of water in the agricultural fields was partly from artificial irrigation. All the variables were included as independent observable variables. The goodness of fit of SEM models was checked using the following procedures: the χ2 test, the root mean square error of approximation (RMSEA) and Comparative Fit Index (CFI), the model had a good fit when the CFI value was close to 1 and the P values of the statistics were high (traditionally>0.05) [59]. With a good model fit, we were able to interpret the path coefficients of the model and their associated P values. A path coefficient is analogous to the partial correlation coefficient, and describes the strength and sign of the relationship between two variables. SEM was conducted using the “lavaan” package in R environment (v3.5.1; http://www.r-project.org/) [60]. In addition, random forest (RF) analysis was performed to identify the major driving factors. To estimate the importance of the variables, we used percentage increases in MSE (mean squared error) of variables: higher MSE% values imply more important variables. The analysis was conducted using the “rfPermute” package in R [61]. Significance of the model was assessed with 5000 permutations of the response variable using the “A3” package in R [62].
Co-occurrence networks. To explore the species interrelationships within complex soil food webs, ecological co-occurrence networks consisting of dominant soil phylotypes from all types of organisms were constructed. We focused on the most dominant phylotypes—those that were both abundant (top 10% of all identified prokaryotes and eukaryotes in terms of relative abundance) and ubiquitous (>50% for prokaryotic and >10% for eukaryotic organisms of all locations) across all soil samples, and identified ecological clusters of strongly co-occurring soil phylotypes within the networks. Using such a filtering approach, our aim was to ensure that we identified dominant phylotypes in agricultural fields and minimize potential spurious correlations from the rare taxa[2, 17, 63]. We focused on the dominant soil phylotypes because they are expected to have a disproportionate functional importance in their ecosystems, and are distributed widely. Although many bacterial taxa are widely distributed, it is unlikely to be the case for eukaryotic organisms. Therefore, we applied a ubiquity threshold of >50% for prokaryotic and >10% for eukaryotic organisms in all locations.
Robust correlations with Spearman’s correlation coefficients (ρ) >0.65 and false discovery rate (FDR)-corrected P-values <0.001 were used to construct networks. We expected that the cut-off, which has been extensively used in literature and is comparable across studies[2], would have both a mathematical and biological meaning and reveal the organisms that are strongly correlated with each other. Nevertheless, we acknowledge that correlation network analyses can yield spurious results, and associations between taxa within such networks cannot be directly interpreted as interactions, and might not fully represent the architecture and connectedness of soil food weds in real-world conditions. However, the information derived from the networks is essential for generating novel hypothesis and ecological frameworks (to be tested in future experiments) on the role of strongly co-occurring phylotypes within food webs in controlling multifunctionality and the link between soil biodiversity and ecosystem functions.
Our network included 1,513 dominant and strongly co-occurring soil phylotypes. The soil phylotypes were dominated by 263 archaea, 1,077 bacteria, 114 fungi, 58 protists, and 1 invertebrate. We identified the ecological clusters within our ecological network using the “igraph” package, and calculated the richness of soil organisms within each ecological cluster across all samples. In addition, we extracted sub-networks by preserving the phylotypes of individual soil samples using the induced_subgraph function in “igraph” package in R [64]. The topological features of the sub-networks in each sample were calculated to estimate the potential complexity of soil food webs, including the number of nodes and edges, average degree, clustering coefficient, average path length, network diameter, and graph density. Average path length refers to the average network distance between all pairs of nodes; network diameter refers to the greatest distance between the nodes that exist in the network; average degree refers to the average connections of each node with another unique node in the network; clustering coefficient represents the degree to which the nodes tend to cluster together; and graph density refers to the intensity of connections among nodes [65, 66] Therefore, higher numbers of nodes and edges, average degree, clustering coefficient, and graph density, and lower average path lengths and diameters suggest a more connected network, reflecting more potential complexity of soil food webs [45, 67]. Networks were visualized using the interactive Gephi platform (https:// gephi.org).
Linking the complexity of soil food weds to the strength of BEF. We applied the moving window approach to explore the determinants of BEF relationships. The technique facilitates the analysis of multivariate data along an ecological gradient [68, 69]. To ensure adequate data amounts, we selected two window sizes of 20 and 30 consecutive samples across sites, generating 209 (e.g. 1-20, 2-21 … 209-228) and 199 (e.g. 1-30, 2-31 … 199-228) data points, respectively. The window was advanced across the sampling sites after reordering along the latitude gradient, generating adjacent sampling subsets to better estimate the complexity of soil food weds and BEF relationships. Based on each sub-dataset, we estimated the amount of variance in multifunctionality explained (R2 of the ordinary least squares linear regressions) by soil biodiversity to reflect the strength of BEF, and constructed soil networks as described above. We then calculated one index to reflect the potential complexity of soil food webs using the topological features of the soil networks via multidimensional scaling analysis[58] (Additional file 1: Table 3 and 4). Note that average path length and diameter, denoting the network sparsity, was calculated as the inverse of the variables (×−1) before the calculation of the index. Subsequently, SEM and RF analysis were performed to evaluate the effect of complexity of soil food webs on BEF strength (as explained above).