2.1 Study area
The East River is a major tributary of the Pearl River, South China, with approximately 90% of the East River catchment located in Guangdong Province. Population in the catchment has increased rapidly, particularly in the densely populated cities of Huizhou and Shenzhen, increasing from 14.9 million in 2010 [7] to 17.6 million people in 2020 [8]. A significant proportion of the population has been known to release wastewater untreated, which has significant implications for aquatic organisms, with observations of contaminants at concentrations above the predicted no effect concentration (PNEC) [4]. The release of untreated wastewater to waterways also has implications for water supply. The East River is the source of the majority of the region’s water supply, including most notably, Hong Kong; ~64–89% of Hong Kong’s potable water was sourced from the East River between 2004–2014. But, as a result of high levels of pollution in the East River, various measures have been taken to ensure that clean water is provided for Hong Kong. Firstly, the water abstraction site on the East River was moved upstream in 1999, to a position with superior water quality [9]. The water is then stored within multiple reservoirs, with water within Shenzhen reservoir treated by a bio-nitrification plant and finally pumped to Hong Kong for final water treatment before consumption [10].
Our case study focuses on subcatchments above Bolou gauging station, and downstream of the Xingfengjiang, Baipenzhu and Fengshuba reservoirs; which includes the city regions of Heyuan, Huizhou and Shenzhen, as shown by Fig. 1.
2.2 Estimating the spatial distribution of wastewater discharge
A list of STWs operational in China is made available annually from the Chinese government, which has been developed in two different formats. The first format was published on the website of the then named Ministry of Environmental Protection (MEP), which provided a list of the names of STWs in each city region, with details regarding the treatment type, the treatment capacity, and the average volume of wastewater treated each day. The population served, service area and STW locations are not provided, although the name of the STW may sometimes indicate its approximate location. This list was distributed up until 2014 and is now produced in a different format by the Ministry of Ecology and Environment (MEE), which is the successor to the MEP. The new list provides more information about the location of each STW including the province, city, district and the town or street. However, the list did not appear to contain information about STWs that had been built prior to 2014; although this is not certain, as there is no information contained within
the list itself that can be used to determine when the STW was built. But it was noted that named STWs that were identified to be constructed prior to 2014 from satellite imagery, were not included in the list. Therefore, both versions of the list were required to locate and identify STWs in the catchment.
For STWs contained in the list provided by the MEE, the address was used to locate the STWs in the Heyuan, Huizhou and Shenzhen areas. To do so, the name of the town or street was input to the search facility in Google Earth, and then the STW was located by examining the satellite imagery in Google Earth. Sometimes, the STW was not possible to locate, in these cases if no other STWs served a town or small city, the STW was assumed to be located immediately downstream of the settlement.
The list provided by the MEP was used first. The MEP STW list does not contain addresses for the STWs and therefore it was significantly more difficult to locate STWs on the list. For each STW, the city region it was located in, and the year that it was constructed was listed, which was not sufficient to locate each STW alone. Satellite imagery was examined for each settlement in the catchment using Google Earth. The Worldpop [11] population dataset was used to identify the location of settlements. This was made more efficient by smoothing the population dataset using the focal statistics tool within ArcGIS spatial analysis toolbox [12]. For each cell, the sum of the surrounding cells was calculated. As a result, the population of low population cells in town centres was increased, as these cells are usually surrounded by cells with high population. Moreover, the population of isolated high population cells were reduced significantly, as they were usually surrounded by cells with very low population. The resulting dataset was categorised in such a way as to only display cells that were determined to equal or exceed the population density of an urban area.
By smoothing the population dataset using focal statistics, the location and extent of settlements was clearer. Figure 2 shows the result of this visual modification and further how this may aid the identification of a STW. Figure 2b displays a satellite image, if compared to Fig. 2a the areas of interest are easier to determine and thus can be examined for STWs such as those identified in Fig. 2c. This process is not essential, but the authors found that the resulting dataset significantly increased the efficiency of locating STWs and enabled a more systematic search.
STWs that were located were then checked against the MEP’s STW list. The approximate construction date for located STWs was determined using Google Earth’s historic imagery feature, by identifying which images contain the presence of the STW, and those that do not; and therefore, with an indication of the construction date and the location, some, but not all STWs were identified. For those remaining, other data sources were used. The website of the Institute of Public and Environmental Affairs [13] was used, which provided a web-based map displaying and naming a significant number of STWs. Unfortunately, the website no longer provides this service as far as the authors can determine.
2.3 Population dataset
To best estimate population served, a realistic gridded population dataset was essential to predict spatially explicit emissions across the catchment. There are a limited number of gridded population datasets available at the appropriate resolution. Most notable are Worldpop [11] and the Gridded Population of the World version 4 (GPWv4) [14]. GPWv4 utilises raw census data and applies that to the respective vector-based census boundary. This is converted to raster, assuming a uniform distribution throughout the census area. For the current application, a more spatially resolved dataset is required. Landscan and Worldpop distribute census data similarly, however they also utilise additional datasets such as road networks and nightlights to predict the spatial distribution of population within the census boundaries. At the time of use, the Landscan dataset utilised data from the previous census in 2000, whilst the GPWv4 and Worldpop utilised the latest census data from 2010, which is better suited for our purposes.
Worldpop distributes population at a relatively high resolution of approximately 100m at the equator. However, the Worldpop population within census boundaries was found, in places, to be dissimilar to the population count estimated by GPWv4, and total population of Heyuan, Shenzhen and Heyuan cities recorded in the 2010 census. Therefore, the Worldpop dataset was adjusted using census data from GPWv4 for 2010. At the time of use, the 2015 “Top-down unconstrained” version of the Worldpop dataset was used, however additional datasets have been released.
The boundaries used to create GPWv4 were not available; however, it was possible to recreate them by means of backwards calculation. The slice tool within the ArcGIS spatial analysis toolbox was used [15], using the "natural breaks" setting; this reclassifies the data into its natural groupings (grouping similar values) whilst maximising the differences between the classes. Cell values within a boundary are identical and therefore, the slice tool groups these values together, and reclassifies them into an integer which enables them to be converted from a raster dataset into a polygon dataset. The disadvantage with this method is that the developers of GPWv4 [14] assigned cells that lay on the boundary to share the value of all intersecting boundaries; these cells are unique, and therefore these cells are grouped only with other cells on the boundary, as shown by Fig. 3. The eliminate tool in ArcGIS spatial analysis toolbox [16] was used to assign the majority of these unique cells to surrounding polygons; the eliminate tool merges selected polygons, in this case border polygons, with neighbouring polygons that have the largest shared border.
The resulting census polygons were used to determine the sum of population within each boundary for GPWv4 and Worldpop. Furthermore, the cells of the Worldpop dataset were adjusted so that the total population within each boundary area matched the population of the GPWv4 dataset. This was achieved using the using Eq. 1 below:
$${\text{R}}_{\text{A}}={\text{P}}_{\text{C}}\text{*}\left(\frac{{\text{R}}_{\text{O}}}{{\text{P}}_{\text{O}}}\right)$$
1
where:
RA - adjusted population value of a cell
PC – actual population within a given census boundary
RO- population count for a cell within the raster being modified
PO - population within census boundary as estimated from original population grid.
To create the population dataset for 2015 and 2020, the population in each cell was first adjusted by the relative difference between each cell in the original Worldpop dataset in 2010 and each cell in the Worldpop 2015/2020 datasets using Eq. 2; this was calculated to account for changes in the distribution of population. The modified population datasets were then adjusted by the relative difference between the total population of each city region, as estimated by the modified population dataset, and the population in each city from official estimates in 2015 and 2020 [17–19], using Eq. 3.
$${\text{R}}_{1}=\text{A}\text{*}\left(\frac{{\text{P}}_{2}}{{\text{P}}_{1}}\right)$$
2
$${\text{R}}_{2}={\text{R}}_{1}\text{*}\left(\frac{{\text{C}}_{\text{p}}}{{\text{C}}_{1}}\right)$$
3
where:
R1 - adjusted population count within a cell
A – adjusted 2010 population count within a cell
P2 – Original Worldpop population count (non adjusted) for 2015 or 2020
P1- Original Worldpop population count (non adjusted) for 2010
PO - population within census boundary as estimated from original population grid
R2 – adjusted population count within a cell, adjusted to 2015 or 2020 population levels
Cp – population within a city according to official census estimates
C1 – population within a city estimated from the raster of R1.
2.4 Streamline and subcatchment generation
A digital elevation model (DEM) is a representation of the elevation surface and may be used for applications such as defining the river catchment area and the river topology. A DEM is readily available from global datasets (e.g. the Shuttle Radar Topography (SRTM) dataset [20]). A vital application of a DEM is to estimate the direction of flow, which can then be used to determine which cells contribute runoff to a river stretch, which can subsequently enable river catchment boundaries and river topology to be generated.
Sub-catchments and the overall catchment for the East River were generated from a 1 arc-second SRTM DEM. The SRTM DEM was projected to an North Asia Albers equal area conic projection with a cell size of 25m. Riverlines were then generated from the DEM using a threshold of 1000 accumulating cells. Subcatchments were then generated for each river stretch (described hereafter as river stretch catchments). The catchment area for each river stretch excludes the area drained by upstream river stretch catchments. This was achieved by using the stream reach and watershed tool within the TauDEM toolbox [21]. Using the DEM, flow direction grid, flow accumulation grid and the river stretch grid (see Fig. 4) as inputs, polyline river stretches and polygon river stretch catchments were generated. Each river stretch and associated river stretch catchment is assigned a stretch ID (shared by the river stretch and the river stretch catchment).
2.5 Delineating wastewatersheds and estimating STW population served
The delineation process was completed manually, using the modified population dataset and topographic data to direct the process. Firstly, the mean population density was calculated for every subcatchment using the exactextractr R package [22], which was as it can account for when cells are only partially overlapped by a polygon; exactextractr also has the advantage of being highly efficient, which is useful when estimating statistics for hundreds of polygons.
For smaller settlements served by a single STW, it was assumed that the STW would serve the entire urban population. To estimate population served, the outline of the settlement adjacent to STWs was digitised in Google Earth. Subcatchments that were intersected by the digitised settlement outline were selected, but also subcatchments for river stretches between the STW and the town or city. The total population within the selected subcatchments was estimated to equal the population served by the STW.
Wastewatershed delineation within large cities (e.g. Huizhou, Shenzhen and Heyuan) required more precision and additional data. Where multiple STWs serve a settlement, it is challenging to determine which STW serves a particular area of the settlement. In addition, large sections of these cities may not be served by a STW. Data regarding population served or the service area of STWs within the area were available for some of the STWs within these areas. This was achieved by obtaining population and service area data from STW managers or from online reports or news articles, which first required the identification of the name of each STW as described previously. With the availability of service area data, subcatchments were added to a wastewatershed until the area of the wastewatershed equalled that of the known service area. Where the population served for a STW was known, subcatchments were added until the total population reached the known population served total. In both cases, subcatchments were selected based on a combination of proximity to the STW and the population density of the subcatchment. It was assumed that the wastewater network would be designed in such a way that the maximum number of people are served, for the minimum length of pipes. Subcatchments were more likely to be included if they were within the STW’s watershed, and subcatchments outside of the river catchment were not included. The rationale behind this approach was that we assume that the majority of the sewage network is gravity based, and therefore areas that are downslope of a STW are less likely to be served by it.
Once STWs with measured population served were delineated, it was then possible to delineate STWs without population served data. These STWs were delineated by assuming the entire area upstream of the STW was served by this STW, but excluding area served by others STWs.
2.6 Delineating upgraded STWs
There was also a need to determine the population served for STWs that had been upgraded in large urban areas, before and after their upgrade. To accomplish this, wastewatersheds for 2015 were delineated prior to the 2010 wastewatersheds. For the 2010 timepoint, wastewatersheds for STWs that were constructed after 2010 were removed, and wastewatersheds for STWs that had been upgraded since 2010 were modified so that their extent represented the area served prior to the STW being upgraded. In order to achieve this, where available, data were obtained describing population served and service areas of STWs for 2010, which was used to modify the wastewatershed so that it excluded areas of the catchment that were not served in 2010. In addition, any development that occurred between 2010–2015 was assumed to be indicative of additional buildings becoming connected to the sewer network, as it was assumed that during the development process, pipe laying could be integrated into the construction process. Areas within the wastewatersheds of interest that had indication of construction between 2010–2015 were digitised, as identified in Google Earth’s historic imagery. Areas within the upgraded STW’s 2015 wastewatershed were removed from the 2010 wastewatershed if they intersected a construction outline. The construction outlines were also used as “barriers”, with the assumption that pipes were not laid prior to the completion of the construction work, and therefore the area beyond the zone of construction was not connected to the STW.
2.7 UK application to test our approach
To test the approach used to estimate population served, we also applied the same methodology to a selection of STWs in South-West England. To do so we utilised the WorldPop dataset for the year 2014, which unlike the Chinese application, required no modification as the underlying census data was at a significantly higher resolution. Spatial data were available for STWs in the region, along with estimates of the population served by each STW. Population served estimates were provided in the form of population equivalents, which are calculated based on the biological oxygen demand (BOD) measured in wastewater, assuming 54 g of BOD equalling one person [23].
Otherwise, the same methodology applied to the East River catchment was applied to the catchments of South-West England. In total, population was estimated for 39 STWs for a variety of different population size.