We combined fine-resolution population and greenspace mapping with population-weighted exposure models to explore the spatial differences in human exposure to greenspace globally, specifically at country, state, and county levels. Focusing on global large cities, we conducted comprehensive assessment and comparison of their urban greenspace exposure and the associated inequality. We further performed statistical analysis to examine the main drivers of the highlighted greenspace exposure inequality. To address the third question posed in the introduction, we leveraged time series greenspace observations to investigate the impact of vegetation growth seasonality on greenspace exposure level and inequality.
Data
Global hierarchy of administrative unit layers. We used the Global Administrative Unit Layers (GAULs) of the Food and Agriculture Organization of the United Nations at country, state, and county administrative levels in 2015 as hierarchical units for the spatial analysis of greenspace exposure globally. GAULs represent the compilation and dissemination of the best available information on administrative units for all countries in the world, providing a contribution to the standardization of the spatial dataset representing administrative units 25.
Global urban areas. In addition to GAULs, we quantified greenspace exposure for 1028 urban areas globally. Different from the commonly used administrative boundaries, urban area boundaries were based on the latest Global Urban Boundaries shapefiles for 2018 26. This dataset was extracted from 30-m-resolution Landsat imagery using a hierarchical approach to improve the homogeneity of built-up areas in urban centers and to maintain the heterogeneity of built-up areas at the urban fringes 26. To ensure a sufficient sample size for statistical analyses using the Gini index (where n is the number of ~1 × 1 km pixels within the urban area), we selected urban areas with a geographic area of >100 km2, which resulted in a total of 1028 urban areas.
Population. We used the WorldPop dataset for 2020 to quantify the spatially explicit distribution of population. WorldPop provides the estimated number of people residing in each 100 × 100 m grid based on a random forest model and a global database of administrative unit-based census information 27, which has much finer spatial resolution and update frequency than other population datasets such as the GWP 28 and LandScan 29.
Greenspace. We leveraged the European Space Agency’s latest global baseline land cover product for 2020 (WorldCover) at 10-m spatial resolution to quantify the spatial distribution of greenspace. The WorldCover map includes 11 different land cover classes with overall accuracy of 75% globally 30. The joint use of Sentinel-1 and Sentinel-2 satellite data not only enhances the spatial resolution of the WorldCover map to 10 m, but it also provides reliable land cover information in areas with persistent cloud cover 30. We extracted all types of forest, shrub, grass, herbaceous wetland, and mangrove from the WorldCover maps as greenspaces.
Modeling and statistical analyses
Greenspace coverage. We first aggregated the 10-m WorldCover greenspace binary map to 100-m grids to realize the mean fractional greenspace coverage. Then, we calculated the physical greenspace coverage rate by overlapping the derived greenspace fractional map with different GAULs according to Eq. (1):
where Gi represents the fractional greenspace coverage of the i-th grid, N is the total number of grids within the specific administrative unit, and GC is the physical greenspace coverage rate for this administrative unit.
Human exposure to greenspace. We applied the population-weighted exposure model 9,10,31,32 to quantify the spatial interaction between population and greenspace. The population-weighted model is a bottom-up assessment that considers the density and distribution of both population and greenspace by allocating higher weights proportionately to greenspace exposure where more people reside according to Eq. (2):
where Pi represents the population of the i-th grid, G represents the fractional greenspace coverage of the i-th grid considering nearby green environments with a buffered radius of d (i.e., 500, 1000, and 1500 m in this study), N is the total number of grids within the corresponding administrative unit, and GE is the corresponding population-weighted greenspace exposure level.
Greenspace exposure inequality. We adopted the widely used Gini index metric 33 as a measure to assess the global inequality in greenspace exposure following the approach of Song et al. 10. Details of the modeling process are provided in the Supplementary Materials. The Gini index ranges from 0 (absolute equality) to 1 (absolute inequality), with a lower value indicating that the amount of greenspace exposure is more even, and vice versa.
Model validation. We used three types of greenspace metric (greenspace fraction, population-weighted greenspace exposure, and the Gini index of greenspace exposure), derived from the classification maps of the Sentinel-2 optical satellite images, as benchmarks with which to evaluate the accuracy of those greenspace metrics derived from the 10-m-resolution European Space Agency WorldCover map. Four major steps were involved in this task. First, we used a maximum value composite approach to generate the yearly vegetation green metrics across 1028 urban cities globally by selecting the maximum normalized difference vegetation index (NDVI) value on a pixel-by-pixel basis from the clear-sky satellite time series. In addition to the maximum NDVI value, we also recorded the corresponding surface reflectance of blue, green, red, and near-infrared spectral bands, and the normalized difference water index for the following image classification purpose. To minimize the uncertainty of cloud cover and cloud shadows for the NDVI composition, we excluded low-quality pixels through application of cloud and cloud shadow masks obtained from the Sentinel-2 cloud probability product, which records the pixel-based cloud probability using a machine learning approach (https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless). The default cloud mask parameters suggested by the Sentinel Hub services and Sentinel Hub cloud detector were adopted in this cloud and cloud shadow removal process (https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless). Second, using the composited Sentinel-2 image, we conducted a vegetation and non-vegetation classification using a random forest machine learning approach with a total of 3607 training samples (749 pixels for vegetation and 2858 for non-vegetation) and 15 decision trees. To assess the accuracy of the Sentinel-2 classification, we randomly generated a total of 20,560 validation samples across 1028 urban cities globally (20 validation samples for each city), and excluded 34 of the validation samples that were encompassed by the cloud or cloud cover masks. Consequently, 20,526 effective validation samples were obtained (Fig. S8). The vegetation cover conditions of those validation points were determined by the following two steps. 1) We conducted a linear spectral unmixing with the three endmembers of vegetation, non-vegetation, and water across the selected global urban cities, and labeled all validation samples with a vegetation fraction larger than the threshold of 0.60 as vegetation pixels. 2) For the remaining validation samples, we classified them as vegetation or not through visual interpretation of both composited Sentinel-2 and high-resolution Google Earth images. With the classification of validation samples, we evaluated the accuracy of the Sentinel-2 classification results, which revealed satisfactory performance for the four widely used metrics considered (overall accuracy: 0.95, precision: 0.94, recall: 0.95, and F1-score: 0.95, Table S2). We also randomly selected 16 urban cities covering the major continents as samples to demonstrate the visual assessment of the Sentinel-2 classification results (Fig. S9), which revealed reasonable spatial patterns of classification when compared with the raw Sentinel-2 images. These independent visual and quantitative evaluations collectively supported the feasibility and robustness of the integration of Sentinel-2 imagery and the random forest classifier for deriving greenspace metrics. Third, we applied the calibrated random forest classifier to the Sentinel-2 images across the global urban cities to classify the images into vegetation and non-vegetation components, and we then calculated the city-level greenspace fraction, population-weighted greenspace exposure, and Gini index of greenspace exposure. Finally, we used these greenspace metrics as references with which to evaluate the accuracy of the corresponding greenspace metrics extracted from the European Space Agency WorldCover map, which revealed overall acceptable accuracy with the estimated regression slope of close to one and high Pearson’s correlation coefficients (Fig. S10). The slight difference between the greenspace metrics from the two dataset sources is attributable to the greenspace seasonality that arises from vegetation phenology dynamics.
Seasonal change in greenspace exposure inequality
To explore the seasonal impact of vegetation phenology on greenspace exposure inequality, we first generated Sentinel-2 image compositions using the maximum value composite approach for global urban areas in spring, summer, autumn, and winter. We then used the random forest models to classify the seasonal composites of the Sentinel-2 images. By modeling the spatial human–greenspace interaction using Eq. (2), we further calculated the Gini index of greenspace exposure in each season. We finally investigated the dynamics in the Gini index of greenspace exposure through pairwise comparison (e.g., summer Gini vs. winter Gini) using scatter plots for the four seasons.
To account for the seasonal difference in urban greenspace exposure inequality, we used a dimensionless measure of dispersion, i.e., the coefficient of spatial variation (csv), to quantify the spatial variability of greenspace exposure. This coefficient is defined as the ratio of the mean value ( ) to the standard deviation (SD) value ( ). The csv was calculated based on gridded greenspace exposure for each city:
We further calculated the SD value over the four seasons to quantify the overall spatial and temporal variation (cstv) of greenspace exposure:
Drivers of greenspace exposure inequality
We compiled a suite of variables to examine their associations with urban greenspace exposure inequality, as measured by the Gini index of greenspace exposure. The inclusive variables comprised five categories (Table S1): (i) geographic variables (latitude and longitude), (ii) topographic variables (elevation and slope), (iii) climate variables (mean monthly precipitation, mean monthly temperature, and mean monthly vapor pressure deficit), (iv) socioeconomic variables (nighttime light intensity per km2 as a proxy of Gross Domestic Product, population density per km2, and road length per km2 within the urban areas), and (v) landscape variables (greenspace coverage rate as a proxy of composition, mean patch size, largest patch size, mean patch perimeter–area ratio, mean patch shape index, and edge density as proxies of configuration). We first rescaled all variables to the range of 0–1. Pearson’s correlation was calculated to check the correlations between the Gini index of greenspace exposure and these five-category variables (Table S1, Fig. S11). We then conducted a partial correlation analysis to further quantify the relationships between the Gini index of greenspace exposure and each explanatory variable by controlling the effects of the other variables (Table S1, Fig. S11). Based on a set of rules including a significance level of a p-value < 0.001, |Pearson’s r| > 0.1, and |partial correlation coefficient| > 0.1, we refined the five explanatory variables for further analysis, including latitude (lat), precipitation (prcp), vapor pressure deficit (vpd), greenspace coverage rate (gcr), and edge density (ed). We then built four ordinary least squares multiple linear regression models to investigate the association between the variables from the different categories and the Gini index of greenspace exposure. The first model included only the geographic variable of latitude. The second model included only the climatic variables of precipitation and vapor pressure deficit. The third model included only the landscape variables of greenspace coverage rate and edge density. The fourth model included all the variables included in models 1–3. Based on the full model result (model 4), we used the variance inflation factor to measure the multicollinearity among the variables. Results showed that the all-inclusive variables achieved a variance in inflation factor of <4 (Fig. S12), demonstrating plausible model performance. The adjusted R2, standardized coefficient, and p-values were used to assess the regression performance. Additionally, we applied a machine learning algorithm, i.e., the random forest model, to build the association between all 16 explanatory variables and the Gini index of urban greenspace exposure. Variable importance was quantified by the indicators of the increase in mean square error and the increase of node purity (Fig. S13).
Variance partitioning was used to quantify the relative variations in the Gini index of urban greenspace exposure explained by the landscape factors into four fractions: (1) unique effect of greenspace provision (i.e., greenspace coverage rate), (2) unique effect of greenspace configuration (i.e., greenspace edge density), (3) joint effects of greenspace provision and spatial configuration, and (4) unexplained residuals.