Overview. We collected data on the measured environmental concentrations (MECs) of antibiotics in Chinese soil. A large number of antibiotics widely occurred in soil, and nine frequently detected antibiotics with wide distribution and high toxicity were selected: tetracycline, chlortetracycline, oxytetracycline, doxycycline, ofloxacin, norfloxacin, ciprofloxacin, enrofloxacin, and lomefloxacin. We also collected data on anthropogenic factors, climatic factors, and soil and vegetation characteristics that could explain the variability in the risks of antibiotic pollution as predictors. Next, we scaled up the risks associated with target antibiotics in soil using a machine learning model (RF algorithm)41. We combined a linear model, generalised additive model, and moving-window approach to describe the relationships between human activities and antibiotic pollution risks on four scales. All modelling and analyses were conducted using R software 4.2.1.
Data collection. To determine soil susceptibility to antibiotic pollution, a comprehensive MEC dataset was established. We compiled the publicly available dataset of Zhang et al.42, which records antibiotic occurrence and distribution in Chinese soil. This dataset contains georeferenced occurrences for our target antibiotics. Moreover, we searched Google Scholar and the China National Knowledge Infrastructure database (http://www.cnki.net/) for field investigations of antibiotics in Chinese soil and gathered 376 studies published from 2000 to 2020. The search terms were ‘antibiotics’ or ‘pharmaceutical and personal care products’ and ‘soil’ or ‘solid matrices’. We recorded the average concentration of target antibiotics from the literature, considering land use type (urban green space, cropland, grassland, and forestland) and experiment location (coordinates) following the standards of Zhang et al.42. Concentrations that were ‘not detected’ or ‘below detection limit’ were regarded as zero values. The MECs were converted into standardised units (µg/kg or ng/g). When the data were presented in figures, mean values and standard error were extracted using GetData 2.26 (www.getdata-graph-digitizer.com). The antibiotic measurements in the dataset correspond to different soil depths, from 5 to 40 cm, with an average depth of ~ 20 cm. When scaling up the susceptibility of soil to antibiotic pollution, we considered a fixed depth value of 0–20 cm.
To expand the existing data, we also collected soil samples from Yunnan and Zhejiang provinces and measured the concentrations of target antibiotics. The soil samples were transported to the laboratory, freeze-dried, and stored at − 20°C until further analysis. The extraction and analysis of antibiotics are detailed in Supplementary Methods (Supplementary Section 2). The measured antibiotic data were combined with the dataset. Our dataset contained the records of antibiotics in most regions of China, which comprehensively depicts the MECs of antibiotics in soil (Supplementary Table 1).
Risk assessment model. To assess the ecological risks of the target antibiotics, the predicted no-effect concentrations (PNECs) were first estimated via the assessment factor approach43. PNECs were determined as the ratio of the median effective concentration (EC50) or median lethal concentration (LC50) to the assessment factor (AF = 1000), or the ratio of the chronic no-observed-effect concentration (NOEC) to the assessment factor (AF = 100); that is, PNEC = EC50 (or LC50)/1000 or PNEC = NOEC/100. Owing to the scarcity of available data, the toxicity data of the relevant terrestrial organisms (e.g., vegetables, crops, earthworms, and bacteria) were collected from the literature44. Then, the species-specific RQs of the antibiotics were calculated as the ratio of MEC to PNEC (that is, RQ = MEC/PNEC). For some antibiotics, the PNECs of soil (PNECsoil) were estimated from their values in water (PNECwater), which were corrected by the solid–water partition coefficient (Kd) via the equilibrium partitioning method (that is, PNECsoil = PNECwater × Kd). To consider the worst scenarios, we defined the risk levels of antibiotic pollution as the highest RQ values. We determined the cumulative risks in soil as the sum of the highest RQs of each target antibiotic.
Identification of candidate predictor variables. The predictor variables were determined from spatially explicit datasets (Supplementary Table 2). Six anthropogenic factors were collected, namely land use type, livestock density (cattle, pigs, and chickens), human population density, GDP, chemical fertilisation (nitrogen and phosphorus), and pesticide use, which are linked to socioeconomic development and agricultural activities. Climatic information on the annual average temperature and precipitation was sourced from the CRUTS V4 database45. Soil properties, which substantially influence the fates and behaviours of antibiotics in soil, were collected as important predictor variables; they included clay content, organic carbon content, bulk density, soil thickness (distance to bedrock), and saturated hydraulic conductivity. We selected groundwater table depth and normalised difference vegetation index as indicators of leaching and plant uptake capacity. Moreover, we obtained terrain information from the GTOPO30 database46, as the terrain can influence antibiotic movement on land. To ensure compatibility across datasets and variables, we processed all data at a spatial resolution of 1 km. The data were extracted according to the experimental site location. The anthropogenic and climatic factors and soil and vegetation characteristics determined the amount of antibiotic residue in soil and the soil susceptibility to antibiotic pollution.
Scaling-up approach and output maps. An RF model including all of the predictor variables41 was fitted to the log-transformed (log10) RQs of each antibiotic using R packages randomForest47 and Caret48. All predictor variables were standardised using the Z-score. We constrained the RF models by setting the maximum number of allowed trees to 1000. Since the RQ data were modelled, cross-validation was important. The dataset was randomly divided into training and validation sets, corresponding to 70% and 30% of the data, respectively. Finally, the data-trained RF model was applied to the gridded data of predictors to estimate the risk levels of nine target antibiotics on a larger scale. the ecosystems not covered by soil (including bare land and water,) were excluded from the analysis based on the land-cover map. An ensemble model for our final predictions and uncertainty analysis was developed via the Monte Carlo approach. Our final model was an ensemble of 500 RF models (that is, the model was separately trained 500 times; each time, a different set of data was selected from our dataset) to generate a distribution of model error on random data49. The distribution density of differences between measured and predicted data was determined for each target antibiotic to assess the accuracy of the RF models. We derived the models’ mean prediction across the ensemble models and determined the uncertainties associated with the predictions from the standard deviation. Maps of the distribution of soil susceptibility to antibiotic pollution, expressed as average RQs, and the corresponding uncertainty maps were derived using the data-trained RF models at a resolution of 1 km.
Finally, we integrated the antibiotic pollution risk, land use composition, and management indicators to identify the regions of concern, where human activities resulted in critical antibiotic pollution. We standardised the land use composition and management indicators via the minimum–maximum method and calculated land use intensity by summing the standardised scores. Antibiotic pollution risk was scaled to 0–1, and then multiplied by 4 (the maximum score of land use intensity) to maintain the comparability. The concern level was calculated as the sum of antibiotic pollution risk and land use intensity, and it was further divided into five levels via the natural break method.
Scale effects and human impacts on risk of antibiotics. The primary goal of the analysis was to identify the effects of spatial scale on the relationships between human activities and antibiotic pollution risks. We evaluated the scale effects by generating datasets after changing the grain size (that is, the resolution at which data are generated at a given extent), and then resampled the datasets after changing the scale extent (that is, the proportion of sampled data at a given resolution). First, to explore the scale effects, four watershed levels (sizes 3, 6, 9, and 12) were selected from the HydroSHEDS dataset50 to represent the changes in spatial grain (Supplementary Fig. 5); here, these levels are renamed levels 1, 2, 3, and 4, respectively. Watershed level 1 represented a broad-scale grain, while level 4 was a small watershed scale.
The land system can be manipulated by technologies and governance systems across spatial scales and were more sensitive in comparison with other human activities29. Thus, we identified areas highly subject to human disturbance using the geographically gridded maps of the land system, which includes land use patterns and management practices. We summarised the mean risks of antibiotics at four watershed levels and the mean land use composition and mean management intensity. Land use composition was reclassified into the area proportions of arable, built-up, and natural (mainly consisting of grassland and forestland) lands. Land management intensity was indicated by the manure nitrogen application rate and irrigated area proportion (Supplementary Table 2), because manure application and irrigation are generally considered the main pathways of antibiotics into soil7. Finally, we integrated the antibiotic pollution risk and land use composition and management indicators to identify the effects of scale on the interaction between soil antibiotic pollution and land system.
We generated a human impact index to characterize the human pressures at four watershed levels using the framework developed by Venter et al.51 with minor modification, which contained seven anthropogenic variables including arable land, pasture land, built-up land, manure nitrogen application rate, irrigated area proportion, GDP, and population density. For comparison across variables, we placed each variable on a 0–10 scale. The grid cells of built-up land were assigned a score of 10, which represents the pressures from the built environment, and the cells of arable land and pasture land, which represent the agricultural environment, were assigned pressure scores of 7 and 4, respectively. Other land use types were scored zero. The grid cells were divided into 10 quantiles with scores 1 to 10, corresponding to increasing manure nitrogen application intensity and irrigated area proportion. Cells with no manure application and irrigation were assigned zero, and GDP standardisation followed the same procedure. Grid cells with a high human population density (> 1000 people/km2) were assigned a score of 10, and other cells with a lower density were assigned a lower log-scaled score51:
$$Score=3.333 \times \text{log}(Population density+1)$$
Furthermore, we summed the scores of these variables in each grid cell to estimate the cumulative human impacts. For any grid cell, the human impact index can range from 0 to 50. The data were divided into two sets (high- and low- human impacted groups) according to the Hu Huanyong line21. We separately fitted the risks of antibiotics to the human impact index using a linear regression model. The relationships between RQs and the seven adopted human activity indices were assessed using multiple linear regression models.
Second, to avoid arbitrary model choices, we conducted an additional 20 runs by selecting different data to explore the effects of spatial extent. We ranked the data according to increasing human impact index. Reduction in scale extent was through the omission of 0–90% of segments (in intervals of 10%) from the lower and upper limits of the ranked data (Supplementary Fig. 7c); that is, 20 datasets were generated for linear model fitting. Through this approach, data with different ranges of human impacts and scale extents were generated. Then, the best models were selected via a procedure based on the corrected Akaike information criterion (AICc; ΔAICc < 5). Model averaging was conducted to calculate the coefficients of predictors and test their significance. This procedure was performed using the dredge function in the R package MuMIn52. Moreover, we characterised the contributions of land use and management to antibiotic-related risks. For doing so, we expressed the contributions of predictors as the percentage of variance explained, according to the absolute values of the ratio of the standardised regression coefficients to the sum of all standardised regression coefficients from all predictors.
Using a moving-window approach, we analysed the effects of land systems on antibiotic-related risks20. We generated a range of window sizes and calculated the relative range of RQs (the ratio of the range of RQs under a given window size to the maximum range under all window sizes). When the relative range exceeded 60%, we fixed the window size. For each window, we calculated the mean RQ and four variables (area proportions of arable land and built-up land, manure application rate, and irrigated area proportion). The partial Pearson correlation coefficients were also calculated. Then, we fitted RQs and partial correlation coefficients against the four variables using a generalised additive model to analyse the effect of the variables on the relationships between human impacts and risks of antibiotics in soil. All of the above-mentioned procedures were repeated on four watershed levels to explore the scale effects.