Mapping soil pollution risk from riparian land-cover changes in the headwater area of an inter-basin water diversion project

. A novel scheme was proposed for mapping precise land-cover change in riparian zones and the resulted soil pollution risk of non-point source (NPS) from an inter-basin water diversion project on its headwater area. The scheme held three key points: 1) digital visualization of the riparian land-cover’s cyclical features was done using remote sensing data fusion of NDWI at high water level and NDVI and MNDWI at lower water level; 2) riparian zones were identified through color clustering on optimized RGB-fusion image that was picked out by applying Modified Vector-Angular Distance (MVAD) formula to measure difference between riparian and non-riparian zones in RGB space; 3) combining Export Coefficient Model (ECM) and the derived data on riparian land-cover change around Danjiangkou Reservoir in China, time-series loads of the soil NPS pollutants were estimated before and after the Middle Route of South-to-North Water Diversion Project (SNWDP) started running in 2014. The results indicated that risk of the potential soil pollution from land-cover change in the new riparian zones enlarged. Growth rates (GR) of total nitrogen (TN) and total phosphorus (TP) not only had significant spatial variation, but also were higher during initial running period of the SNWDP’s Middle Route.


Introduction
Large-scale water-transfer projects and their influence on water-diverted regions have become research hotspots. Water shortage hinders socioeconomic growth 1 . As a solution, water transfer has received interest amongst policymakers, and numerous water transfer projects had been implemented across the world including in Brazil, China, Germany, Australia and the United States, etc [1][2][3][4][5] . China, a country with a long history of inter-basin water transfer, started another high-profile water transfer project, i.e. the South-to-North Water Diversion Project (SNWDP) at beginning of the century 5 . The SNWDP consists of three subproject routes (namely, the East, Middle and West Routes) that will divert 44.8 billion m 3 of water annually from the water-rich Yangtze River Basin. Similar to other large-scale water transfer projects elsewhere in the world, the SNWDP aims to relieve water stress in drought-stricken areas (specifically in North China), promote socioeconomic development and help improve the eco-environment in water-deficient regions 3,6 . But for the water-diverted region, the activities of the impounded and diverted water and the resultant change of water cover will have complex and profound effects on the regional eco-environment, hydrology, local climate, and patterns of land use and land cover 3,[7][8][9][10] . So a lot of researches have been keeping eyes on influences from large-scale water-transfer projects on the water-diverted regions.
Now one focus is about how to get reliable information on land-cover changes in riparian zones and the resultant variation of soil pollution risk that's essential for objectively evaluating the influences of water-transfer projects. Compared to other areas in water-diverted regions, riparian zones deserve serious attentions [11][12][13][14][15][16][17] . As an interface between land and water, wild riparian zones act like natural biofilters that can remove pollutants from surface runoff before entering the water body. Fennessy and Cronk 18 showed that a riparian buffer zone of 20 to 30 m width can remove up to 100% of incoming non-point nitrate. Zhao et al. (2009) 19 found that the retention rate of agricultural non-point nitrogen pollution by three vegetation types in the riparian zone of the Yellow River wetland in China could exceed 50%. However, if human activities such as crop plantation exist in the riparian zone, more agricultural NPS pollutants and soil heavy metals are held in the area, which conversely increases the potential risk of soil pollution 20 ; and this risk is greater in new riparian zones resulting from the artificial activities of impounding and diverting water 21,22 . Thus, many studies have focused on the influence of large-scale inter-basin water-transfer projects on the risk of soil pollution stemming from the associated changes of riparian land cover in the water source area 20 .
However, obtaining precise information on the coverage and variation of the riparian zones was, and still is, a significant problem in such studies. As is known, riparian zones result from water level fluctuation, and can be viewed as temporary water cover according to definitions of land cover [23][24][25] . It thus seems feasible to obtain information on riparian zones using water coverage data extracted at different water levels from remote-sensing imageries, such as through threshold segmentation of modified normalized difference water index (MNDWI) images [26][27][28] . But, this popular method for water identification is not always suitable for riparian zones, especially those that emerged from large-scale engineering projects involving the artificial impounding and diverting water projects. The reason lies not only in the difficulties associated with obtaining an accurate threshold with constant water level fluctuation, but also in overemphasizing the hydrological characteristics of riparian zones and failing to thoroughly understand the zone's nature. It is impossible for a riparian zone to be same to the water cover, even if inundated, which can be proven by the former's higher heterogeneity 16 and unique characteristics of seasonal rotation between water and land. Thus, errors often occur when extracting information on riparian zones from remotely sensed classification data of water coverage at different water levels. Additionally, misclassification is still common when extracting water coverage in hilly areas from remote-sensing imageries, because accurately separating water from built-up areas and the shadows of mountains remains a problem [29][30][31][32][33] . Due to the "error amplification phenomenon" 34 , larger error rates consequentially emerge when riparian zones are identified using water coverage data at higher levels minus those at lower levels. Therefore, precise identification of riparian zones is indispensable for scientific evaluations of the influence of large-scale water-transfer engineering projects on land-cover change and the potential risk of soil pollution in the water-diverted regions.
The objectives of this study were to propose a scheme for getting reliable data on riparian zone variation and land-cover change, and further exploring the consequent potential risk of riparian soil pollution with water level fluctuation from the SNWDP's Middle Route on its headwater area. The idea was firstly implemented by digital visualization of the riparian cyclical features of water-land rotation that were recorded by time-series remote sensing data. Then, a computer-vision color clustering algorithm was applied to the picked out thematic imagery from an optimized RGB color-matching option, which ensured to extract precise information on the coverage of the riparian zones at different water levels. Lastly, combining the information on variation of the riparian Data sources. Water level and Landsat imageries. To explore inherent links between the water level change and the riparian zone variation, real-time data of water level elevation (WLE) in Danjiangkou Reservoir (Fig. 2) were collected from several public websites (http://www.cjmsa.gov.cn, www.moc.gov.cn, http://www.cjh.com.cn and http://www.hbnsbd.gov.cn/zhxx/GQ.aspx). The water level of Danjiangkou Reservoir varied mainly with natural seasonal change of inflow and outflow runoff before the SNWDP's Middle Route started running 38,39 . But, since the project went into service on 12 December 2014, the water level has been observably affected by the man-made activities of impounding and diverting water, more than that by natural hydrological processes 40 , and numerous new riparian zones therein came into being. It's hence important to study relationship between the reservoir's water level change and its riparian zone variation.
Additionally, for knowing the real-time changes of the riparian land-cover, some specific Landsat imageries at the lowest and highest annual water levels during 2000-2018 were collected ( Table 1).
Soil pollution and other data. The collected soil pollution data (0-20 cm) were focused on agricultural NPS pollutants and heavy metals from cropland in the new riparian zones. They comprised 59 NPS samples from field investigation (Fig. 3a) and 169 sample sites of soil heavy metals from existing studies (Fig. 3b) 21,22 , which mostly spread in the riparian zones between water levels of 150 and 170 m.
Other data consisted of land use data during 1990-2010 and Google Earth images at a resolution of 0.5 m, which provided information on the past croplands and improved the remotely sensed interpretation of the present-day croplands in the riparian zone.    Riparian zones have unique characteristics of annual cyclical rotation between water and land. Time series of remotely sensed imageries can record the seasonal change in riparian zones, thereby making it possible to gain reliable information on riparian zone variation and landscapes changes in the zone.
To transform the key properties of riparian zones into available digital and visual information, three steps were implemented: Firstly, a normalized difference water index (NDWI) 41 was constructed based on the Landsat imagery at high water level of Danjiangkou Reservoir, while a normalized difference vegetation index (NDVI) and MNDWI 32 were employed at the reservoir's lower water level. These indice are shown as Eq. (1) , and represent the reflectance of the green, red and near-infrared bands corresponding to Landsat/OLI bands 3, 4 and 5 (bands 2, 3 and 4 for TM/ETM+), respectively; 2 and 2 represent the reflectance of different mid-infrared bands corresponding to Landsat/OLI bands 6 and 7 (bands 5 and 7 for TM/ETM+), respectively.
Secondly, these indices were combined into a thematic dataset, in which the recorded information came from characteristics not only of the riparian zone as a temporary water Lastly, a composite RGB image was generated from the thematic dataset, with one color (i.e., red, green or blue) given to one of the above indices so as to achieve visualization of the riparian zone's water-land rotation. For instance, the colorful image shown in Fig. 4a was generated with green given to NDWI at high water level (166.94 m) on Oct. 28, 2017, red to NDVI and blue to MNDWI at lower water level (152.59 m) on Mar. 2, 2017.
Identification of riparian zones based on the optimized RGB-fusion image color clustering. Compared to geometric features (e.g., texture and shape) in images, color has greater stability and has been widely applied in computer-vision detection in the region of interest (ROI) [42][43][44] . Generally, if the ROI has its own unique color features in the image, it is certainly feasible to directly identify the ROI based on color difference 45 .
However, here, six kinds of composite RGB images may be generated from a threeband dataset, the ROI's color therefore could be changed by setting different color schemes (e.g., various colors of the same riparian zones in Fig. 4). Thus, a problem was inevitably caused-that is, which of the RGB-fusion images was optimized for precise identification of the riparian zone?
To solve the above problem and in view of perceptually accurate retrieval results on prominent colored areas in the RGB domain able to be produced from measures based on the angle of a color vector 46,47 , a modified vector-angular distance (MVAD) formula 48 was adopted to measure color difference between riparian and non-riparian zones in RGB color space. The MVAD formula is as shown in Eq. (4): represents the contribution from every single color component on the angular between two vectors in RGB space, = + + (10) is a correct factor to eliminate effects of the maximum at the bottom of RGB space. = max ( , , , , , ) 255 (14) Fig. 5 showed the measuring results on color difference from the two kinds of color matching schemes in Fig. 4. And the RGB-fusion image of Fig. 4b could be found better with the larger color distance between riparian and non-riparian zones as shown in Fig.  5b than that in Fig. 5a.
At last, RGB color clustering-based unsupervised classification algorithm was applied to the optimized RGB-fusion images, and clusters of the riparian zone at different water levels were all picked out.
Classification of land cover in the riparian zone. Approximate maximum coverage of the riparian zone was firstly derived from Landsat/OLI imageries on 21 January 2014 and 28 October 2017 with water levels of 139.56 m (close to the reservoir's dead water level of 139 m) and 166.94 m (close to the record highest water level of 167 m), respectively. All land-cover types in the largest riparian zone were then derived from existing land-use data and the late classification of Landsat images during 2010-2018. Using the data in the largest riparian zone (Fig. 6), real-time information on land-cover change at any water level could be extracted. Moreover, an error confusion matrix was built for evaluating the accuracy of the land-cover classification by using GPS-based field investigation data and Google Earth images at a resolution of 0.5 m. The resultant Kappa coefficients were all over 90%.
Estimation of NPS pollution loads. The ECM has been widely applied in estimating NPS pollution loads [49][50][51] . It uses an export coefficient approach to calculate the loads of  The universal ECM 49 is as follows: (Eq.15) where L is loss of nutrients (kg), is the export coefficient for nutrient source (kg/km 2 /yr), is the area of the catchment occupied by land-use type (km 2 ), or number of livestock type , or of people, is the input of nutrients to source (kg), and is the input of nutrients from precipitation (kg).
Because of the particular nature of Danjiangkou Reservoir's role as the headwater source of the SNWDP's Middle Route, nutrient sources ( ) in the reservoir's riparian zone comprised the area of each land-use type through fertilizer applications and nitrogen fixation-that is, people, livestock, and precipitation could not be included in the model for our research.  The export coefficient ( ) shows the rate at which nutrients were lost from each identifiable source to the surface drainage network 49 . It has strong spatial heterogeneity and varies among different land-use types 50,52 . Here, the of each land-use type in the riparian zone ( Table 2) was determined from a literature 52 , in which the export coefficients of different land-use types were determined by local precipitation, terrain, soil and vegetation fraction, etc.
the GR of pollution loads with water level fluctuation was calculated following Eq. (16) to compare the variation of TN and TP loads among different riparian zones. ∆ = /( − 0 ) (Eq.16) where ∆ is the GR of TN or TP loads in the riparian zone at water level j, is the total load of TN or TP in the riparian zone at water level j, and 0 is the reservoir's dead water level (i.e., 139 m).

Spatio-temporal variation of the riparian zone and resultant land-cover change.
Due to the water level change in Danjiangkou Reservoir (Fig. 2), the riparian zones varied markedly during 2000-2018. Especially before and after the SNWDP's Middle Route started transferring water on 12 December 2014, two typical changes of the riparian zones could be clearly found: firstly, the riparian acreage grew considerably compared to that before the water transform project (Fig. 7a). On 28 October 2017, the riparian area increased to 44 481 hm 2 with the then-water level (i.e., 166.94 m) closer to the project's normal retention level of 170 m; whereas, on 8 July 2014 with the then-water level of 141.73 m, the riparian area was only 1 625 hm 2 . On 3 January 2000, the reservoir's water level dropped to 135.12 m (close to its recorded lowest water level), and thus a 6 687 hm 2 special riparian zone (Fig. 8a) occurred below the reservoir's 139 m dead water level, which did not occur again after 2014. Secondly, distribution of the riparian zones of Danjiangkou Reservoir varied differently at the county scale (Fig. 8a -Fig. 8o). Despite the coverage of the riparian zone having expanded in three counties (Xichuan County, Yunyang County and Danjiangkou City) with the reservoir's water level rising, always around half of new riparian zones emerged in Xichuan County (Fig. 7a). And as the water level rose from 139 m to 160.33 m, the acreage of new riparian zones in the county enlarged to almost 182 km 2 , whereas the acreage was 80.09 km 2 and 66.05 km 2 in Danjiangkou City and Yunyang County, respectively.
Additionally, water level of the reservoir previously fluctuated with the seasonal changes in hydrology with a range from 139 m to 150 m before the SNWDP's Middle Route (Fig. 2), and a 17 230 hm 2 riparian zone hence emerged with 716.5 hm 2 of cultivated land and 101.4 hm 2 of shrub/grassland in the zone (Fig. 7b). But during the initial water transfer period of the SNWDP's Middle Route, the reservoir's water level was always over 150 m (Fig. 2), and the riparian zone apparently enlarged compared to those before the water transfer project started running. On 7 December 2014, the zone's area was 32 893 hm 2 and contained around 8 054.65 hm 2 of cultivated land and 799.67 hm 2 of shrub/grassland (Fig. 7a). Thereafter, the water level of Danjiangkou Reservoir mostly varied between 150 m and 160 m, although it was raised to 166.94 m on 28 October 2017 with the subsequent area of submerged cultivated land and shrub/grassland reaching 12 896.34 hm 2 and 6 768.47 hm 2 , respectively (Fig. 7b).

NPS pollution before and after the running of the SNWDP's Middle Route.
According to our field samples of soil pollution (Table 3), there were low background contents of TN and TP in the investigated riparian cropland. In all 59 soil samples, about 90% had a TN value less than 1.5 g/kg (Fig. 3a) and 80% less than 0.8 g/kg ( Table 3), which is consistent with the findings of Wang (2015) 21 . Even so, the potential risk of agricultural NPS pollution from the land-cover change would be enlarged, especially from the rotation between water and agricultural lands in the riparian zone. The reason lies in the fact that most of the agricultural lands in the riparian zone are cultivated at lower water levels for one-seasonal crop plantation with popular use of nitrogenous and phosphorus fertilizers 53 . It is known that nutrient loss from agricultural land use/land cover (e.g., cropland and grassland) is an important aspect of NPS pollution 49 . Whereas, the utilization rate of fertilizers was less than 40% in the study area, meaning redundant nitrogen and phosphorus in the soil will ultimately enter the reservoir through runoff as well as underground water 54 . Besides, the annual GRs of TN and TP loads in the riparian zones were estimated at different water levels. The result (Fig. 9) shows that their GRs were not only always higher than those before the water transfer project started running, but also had more significant spatiotemporal variation during the initial running period of the SNWDP's Middle Route; e.g., the GRs of TN loads dropped sharply from 11.85 t • a −1 on 28 October 2017 to 1.37 t • a −1 on 16 April 2016, while TP ranged from 0.56 t • a −1 to 0.06t • a −1 . Additionally, the GRs of the two loads in Xichuan County were the highest in the study area, and the GR of TN loads was far higher than that of TP loads regardless of whether the SNWDP's Middle Route was running.

Discussion
Extraction of riparian zone coverage. It is crucial to obtain reliable information on the coverage of riparian zones and resultant change of land cover in the zones at different water levels for the precise estimation of the soil pollution risk in water-diverted areas. Previously, a popular method 28 for extracting the coverage information of riparian zones was through the remote sensing of water coverage at high levels minus that at lower levels. However, this approach is prone to error in identification on riparian zones, especially for new riparian zones that have arisen from the engineering projects of artificially impounded and diverted water actions. Fig.10 shows how the widespread hill-shadow was often misidentified as water in the process of remotely sensed classification and had to be cleared through careful manual revision of the classification results. However, this confusion not only was difficult to be completely solved but also fueled further error in the extraction of riparian zone coverage.
In order to eliminate this misidentification, this paper took advantage of the unique properties of cyclical rotation between water and land in the remote sensing identification of new riparian zones. For example, in spite of greater similarity between water and hillshadow in multi-band Landsat OLI imagery (Fig.10a) or single-band MNDWI imagery (Fig. 10b), apparently different coloring of the riparian zone from water and hill-shadow could be observed in the RGB-fusion image of NDWI at higher water levels and NDVI and MNDWI at lower water levels (Fig. 10c). Therein, the riparian zones at different water levels could be easily and accurately extracted with a color clustering algorithm applied to the RGB-fusion image. Based on time-series data of riparian coverage (Fig. 8), a general law of riparian zone's variation with water level fluctuation in the study area was achieved, e.g., most new riparian zones emerged in Xichuan County and the total acreage of riparian zones enlarged significantly after the SNWDP's Middle Route started running, which matched well with the reported research results on the dynamic capacity of Danjiangkou Reservoir from Liu (2018) 40 .
Potential risk of soil pollution from change of the riparian zones. Change of riparian land cover in the study area was characterized by rotation between water and land, which was mostly composed of shoaly land and agricultural land (Fig. 6). According to our field investigation, the shoaly land was stable and prohibited cultivation, while most of the agricultural lands in the riparian zone around Danjiangkou Reservoir were cultivated for one-seasonal crop plantation at lower water levels. Because the popular use of mineral fertilizers in local conventional agricultural practices and nutrient loss from agricultural land use/land cover (e.g., cropland and grassland) are important sources of NPS pollution 49,52 , the GR of TN and TP loads from the riparian cropland were certainly larger with higher water levels (Fig. 9). Given that some of the shoaly land and grassland resulted from abandoned croplands, the true amount of NPS pollution loads in the riparian zone was in fact higher than the calculated amounts in this paper.
Moreover, we didn't investigate the loads of soil heavy metals and their change in the riparian zone with water level rising. However, an existing study 22 showed Cd and As were the dominant soil heavy-metal pollutants, in which the content of Cd (1.04 mg/kg) was significantly higher than its background value (0.065 mg/kg) in the sampling area. Because of the reservoir's role as the only one drinking-water source of the SNWDP's Middle Route 27 , close attentions in future should be paid to the potential risk of soil heavy-metal pollution from land-cover change in the riparian zone.

Conclusions
The purpose of this study was to present a scheme to acquire the reliable information of the riparian land-cover change, and to further explore the resultant soil pollution risk in the headwater area of the SNWDP's Middle Route. Given that riparian zones are characterized by high land-cover heterogeneity and their coverage cannot be precisely extracted from the remotely sensed identification of water coverage at high levels minus that at low levels, this study proposed a scheme for digitizing and visualizing the unique water-land rotational feature of the riparian zone and applied a color clustering algorithm to the optimized RGB-fusion images of NDWI at high water levels and NDVI and MNDWI at lower water levels. The distinguishing performance of the method were showed, especially at accurate identification of the riparian coverage from hill-shadow in the remote sensing imagery. A law of riparian zone variation and the resultant land-cover change with water level fluctuation was also achieved for the study area. The submerged land cover in the riparian zone at the water level of 166.94 m mainly comprised shoaly land, cropland and grassland, among which shoaly land was dominant. Additionally, most of the new riparian zones emerged in Xichuan County, and the total acreage of the riparian zones and submerged croplands enlarged significantly after the SNWDP's Middle Route started running.
Based on dynamic information of riparian land cover and field investigation data of soil pollution samples, an ECM-based calculation of NPS pollutant loads in the riparian zone was completed at different water levels during 2006-2018. It was found that the potential risk of soil NPS pollution from land-cover change in the riparian zone rose to a higher level than that before the implementation of this large-scale water-transfer project. The GRs of TN and TP loads were not only higher in the riparian zone during the initial running period of the SNWDP's Middle Route, but also possessed significant spatial variation insofar as the GRs of TN and TP loads in Xichuan County were always the highest in the study area. Moreover, the GR of TN loads was far higher than that of TP loads, regardless of whether the SNWDP's Middle Route was kept running.
As a large-scale inter-basin water transfer project, the SNWDP's Middle Route has the potential to cause complex and profound effects on water-diverted regions from the manmade activities of the impounded and diverted water. This study can provide accurate insights into the project's influence.