Global mining data
The mining dataset was sourced from the S&P Capital IQ Pro database 33. It provides latitude and longitude for each mineral and metal commodity. We drew 10 km buffers around S&P dataset, following the recommendations of Maus et al. 2020, and overlaid the buffers with the mining data area from the same dataset34. We included only those mining buffers that contained Maus et al. 2020 mining areas, as these correspond to currently operational mining sites. That resulted in a total of 8,675 data points. Moreover, since only 27 countries were included due to the availability of socio-economic datasets, we had 1373 large scale mining operations within these countries. For the water scarcity analysis, we subsequently overlaid the 8,675 mine sites with the global water-scarcity dataset. Due to the overlap of mining location data and the water-scarcity dataset, we ultimately selected 8,103 mine sites.
Household wealth index
Data on the household wealth index came from the Demographic and Health Survey (DHS) database35. The wealth index is a composite measure which assesses the living standard of a household. These values are derived by considering various assets owned by the household. They range from 1 to 5, with 1 being the poorest and 5 being the wealthiest. We examined all countries (27 in total) for which georeferenced household survey data were available from 2000 to 2019 and where our mining dataset indicated the presence of mining activities. We only considered rural households, as livelihood outcomes in these communities are more directly impacted by land use decisions in adjacent areas. In total, we considered 1.32 million households distributed across 60,064 clusters. As the DHS provides geographic data at the cluster-level, we were able to analyze 60,064 population cluster across 27 countries. To protect the privacy of the surveyed population, rural clusters are offset by 5 km, with 1% randomly offset by 10km. The number of clusters per country varied, ranging from a minimum of 203 in Kyrgyzstan to 42,239 in India. The distribution of clusters reflects the geographic area and population size of each country, as well as the frequency of standard DHS surveys conducted there. Complete details regarding the number of observations for each country, alongside the number of DHS standard surveys conducted from 2000 to 2019, are provided in supplementary material Table S2. Typically, each cluster comprises 20-30 households, although this number can vary depending on survey design and country-specific requirements. For our analysis, we selected the household wealth variable for rural areas from various DHS standard surveys as the dependent variable. Median values were assigned to clusters to indicate rural cluster wealth. We calculated the distance from each cluster to the nearest mining area in each country.
Covariates of household wealth
We also assessed a number of demographic, geographic, and agro-ecological variables that are known to influence household wealth 36,37 (Table 1). For demographic factors, we examined the education level and gender of the household head, using data from the DHS standard survey for a particular year 35. Geographic factors considered included the distance to the nearest main road and urban center, and the population density around the DHS cluster. To calculate distances, we utilized OpenStreetMap data for proximity to the main road, while population density was determined using the 2015 population density dataset from the Center of International Earth Information Network 38,39. For agroecological variables, we referred to the Food and Agriculture Organization/International Institute for Applied Systems Analysis Global Agro-Ecological Zones to classify agricultural soil suitability for rainfed, high-input cereals 40. Additionally, the last covariate we used was percentage of tree cover in year 2000 within a 5 km buffer around the population cluster, drawing on forest cover data from Hansen et al. 201341.
Mixed Effects Regression Modelling
We utilized a linear mixed-effect model to estimate the wealth coefficients using the ‘lme4’ package in R (eq. 1). This model predicts the household wealth index based on the previous mentioned geographical, demographical, and environmental predicators. Moreover, we introduced random effects to account for potential variations within administrative regions and across different survey years.
where β is the intercept, β0, β1, ..., β8 coefficients represent the fixed effects of each independent variable, b1j and b2k represents the random effects for different years and different regions within the country and ∈ is the error term.
Water Scarcity Assessment
For our water scarcity analysis, we utilized BWS data developed by the World Resources Institute (WRI) Aqueduct Global Maps 2.1 data42. The estimates of water stress are derived from long-term time series data, reflecting chronic water stress rather than acute drought conditions. BWS measures total annual water withdrawals (municipal, industrial, and agricultural) expressed as a percentage of the total annual available blue water. Regions with water withdrawal of up to 10% are considered low water-scarcity areas, up to 20% are considered low to medium, 20% to 40% medium to high, and 40% to 80% high, while above 80% is extremely high, and areas with above 97% withdrawal are classified as arid. BWS maps were overlaid with maps of mining areas to determine the water stress levels occurring at each mining location. This enabled us to determine for which minerals ambient levels of water stress may increase competition for finite water resources between mining and other societal and environmental needs16