Accessibility coverage estimates
Estimates of accessibility coverage, modelled by constructing a travel time grid at 100-meter resolution for all of sub-Saharan Africa (Supplementary Fig. 1), show greatly divergent results using the six different population datasets (Fig. 1A and B). For all of sub-Saharan Africa, the population that has access to healthcare is highest when using HRSL, followed by GHS-POP (Fig. 1B). Differences in accessibility coverage are larger at 30- and 60-minute catchments and logically decrease as travel times increase. An estimated 88.2% of the HRSL-derived population has access to a health facility within 30 minutes travel time. This value drops to 60.5% when GPWv4 is considered. Access to healthcare is in general substantially lower when statistics are derived using GPWv4 and WorldPop top-down unconstrained datasets (Fig. 1B). These two datasets also present the largest differences in accessibility coverage as compared to the other datasets (Fig. 2). Although the differences between the other datasets are smaller, there are still coverage differences of up to 9.5% among the other population products at 30 minutes travel time (Fig. 2). The relative differences are smallest between Landscan and WorldPop top-down constrained and between HRSL and GPWv4. Accessibility coverages at the sub-Saharan African level already show strong variation, but continental summary statistics substantially mask variations at the national and sub-national level.
Moving from sub-Saharan African to national coverage statistics, we find new patterns, with varying results between and within countries (Fig. 3 and Supplementary Table 2). Strongly divergent trends are particularly evident in some countries, including Chad, Sudan, Eritrea, South Sudan, Central African Republic, Republic of the Congo, Democratic Republic of the Congo, Equatorial Guinea, and Gabon (Fig. 3). In these countries, we observe lower coverage statistics for GPWv4 and WorldPop top-down unconstrained, sometimes followed by significant discrepancies between the coverage values for the other datasets. The differences in accessibility coverage can exceed 60% and would affect any conclusion drawn from one of the individual population datasets. In the Republic of the Congo, for example, accessibility coverage at 30 minutes travel time ranges from 28.8% to 88.9%. Using GPWv4 or WorldPop top-down unconstrained suggests that 71.2% or 65.5% of the population in the country is unable to reach the nearest health facility within half an hour travel time. In contrast, using GHS-POP, HRSL, Landscan, or WorldPop top-down constrained indicates that 11.1%, 13.9%, 15.8%, or 27.3% of the population is unable to reach healthcare within half an hour. This discrepancy between the datasets may have a strong impact on the conclusions drawn from monitoring global and national indicators of access to healthcare, and thus on decision making for resource allocation.
Figure 4 illustrates accessibility coverage within 1-hour catchments at the sub-national (i.e., administrative 1) level. Supplementary data 1 presents accessibility coverage for 30, 60, 90, 120, 150, and 180 minutes travel time at administrative level 2. Despite the similarities in overall accessibility patterns, with low access in northern and central sub-Saharan Africa and higher access in southern sub-Saharan Africa and coastal regions, sub-national differences between the datasets are clearly evident. Low accessibility coverage is particularly widely spread for GPWv4 and WorldPop unconstrained. In Figure 5 we present the average percentage point difference between the datasets we observe at the sub-national level. The average difference between all datasets can be as high as 45.4%. However, when comparing individual datasets, the sub-national average difference can exceed 70% (Fig. 5B).
Explaining discrepancies in coverage estimates
Most of the observed discrepancies in accessibility coverage can be explained by the characteristics and quality of the input data and the redistribution approach used for creating the gridded population datasets. More specifically, the main differences in accessibility coverage that we observe can be explained by 1) the use of settlement data to conditionally constrain population to buildings, 2) the quality and resolution of the settlement data used, and 3) the granularity of the smallest publicly available unit for population data. In Figures 4 and 5, the differences in accessibility coverage are particularly evident between datasets that constrain population to settlements (i.e., WorldPop top-down constrained, HRSL, GHS-POP, and Landscan) and the other datasets that allocate population based on dasymetric weighting or other areal interpolation techniques. Constrained population datasets typically use building footprints or settlement feature data derived from satellite imagery to constrain the distribution of population to grid cells in which buildings have been detected. The datasets based on settlement data have a large proportion of zero cells in areas where no buildings are detected31. This means that population is commonly distributed over smaller areas and therefore more concentrated in regions with human activity and health facilities. In contrast, datasets that do not contain information on settlements have a small proportion of zero cells. This is a natural consequence of using approaches that spread population over vast areas of land where few or no people are likely to reside, including extremely uninhabitable areas such as deserts or dense forests where there are no health facilities. These distorted distributions ultimately result in longer travel times for some of the population and therefore smaller overall accessibility estimates.
In northern Chad, for example, accessibility coverage is between 58.1% and 72.4% at 30 minutes travel time using HRSL, GHS-POP, Landscan, or WorldPop top-down constrained, and drops to almost 0% when GPWv4 or WorldPop top-down unconstrained is considered. Similar patterns were also observed in northern Niger and other regions south of the Sahara desert. This region is sparsely population and has large differences in accessibility patterns between the datasets. Figure 6 shows an example of the observed visual differences between the datasets. The same is true for some regions in central sub-Saharan Africa, such as the Republic of the Congo, Gabon, and the Democratic Republic of the Congo where large areas of land are characterized by dense and closed forests with very few detected settlements (Supplementary Fig. 2). In Ogooué-Maritime, a province in western Gabon characterized by dense forests, accessibility coverage within 30 minutes ranged from 87.9% to 96.3% when using WorldPop top-down constrained, Landscan, HRSL, or GHS-POP, in ascending order of coverage. However, accessibility coverage decreases to 11.1% and 3.8% when WorldPop top-down unconstrained and GPWv4 are used. Comparisons of accessibility coverage between the settlement-based population data also show discrepancies (Fig. 2 and Fig. 5), as their accuracy appears to be highly dependent on the complete identification of individual building structures. The quality of the underlying satellite data containing information on built environments and the applied methodology to automatically extract built features involves omission and commission errors, leading to an under- or overestimation of uninhabited areas20, 35, 36. While WorldPop top-down constrained uses polygon building footprint data and HRSL uses high resolution satellite imagery (~50 cm), GHS-POP extracts built features from Landsat 8 imagery with a resolution of ~30 meters30. Due to the difficulty of interpreting built-up areas from coarser satellite imagery, GHS-POP and, to a lower extent, Landscan have previously been found to overestimate uninhabited zones and thus underestimate people in sparsely populated sub-urban and rural areas20, 37, 38. We found similar patterns in two rural areas in Garissa and Nakuru counties in Kenya, where divergent patterns of settlement detection between the gridded population products were seen (Supplementary Figures 3 and 4). Particularly GHS-POP did not seem to allocate population in small settlements that were included in the other datasets (Supplementary Figure 4). When no population is allocated to small rural settlements, a relatively large proportion of the population is distributed into larger built areas where facilities are located, this likely contributes to higher accessibility coverage statistics for GHS-POP and Landscan as compared to HRSL and WorldPop top-down constrained.
An important challenge for all gridded population datasets is the quality and granularity of the input population data. Even though census data is often collected at the household level or in smaller enumeration areas, countries usually release aggregated data at specific administrative levels to protect privacy18. The scale at which the latest population census is made publicly available varies widely across sub-Saharan Africa (Figure 7A) and ranges on average from about 2 km2 to 182,211 km2. Figure 7B illustrates the association between population input unit size (km2), relative coverage difference between the datasets at 1-hour travel time, and average total population per administrative unit (level 1). The figure shows that in areas where there are large differences in accessibility coverage between the datasets, the size of the population input unit is generally large, and the total population living in these units is small, mostly in the first or second quantile (Fig. 7B, top right corner). This means that when population counts in thinly populated areas are aggregated into large units, differences between the datasets are greatest. Figure 5 and Figure 7 show similarities between areas with high accessibility coverage differences and regions with large population input sizes, such as the northern- and central parts of sub-Saharan Africa. Sangha, for instance, a region in the Republic of the Congo has one of the highest average accessibility coverage differences between all datasets (45.4%). The average total population of 45,281 people is spread out over approximately 57,686 km2 land and the landscape primarily exists of dense forests, complicating building detection. The same is true for an area that we described before, Ougooué-Maritime province in Gabon, where the average coverage difference is 45%, the average total population is 44,230, the population input unit size is 7,528 km2, and the landscape is dominated by dense forests (Supplementary Figure 2).
The aggregated nature of the input population data masks the spatial variability in population distribution at finer scales and therefore causes uncertainty when total population counts are reallocated into grid cells. Our analysis suggests that particularly in thinly populated areas where population data is made available at a coarse scale, the different distribution techniques between the datasets cause the most varying reallocation patterns and thus translate into widely ranging accessibility coverage estimates.