Evaluation of hydrological response under future climate change by revising meteorological data in the source region of the Yangtze River, China

The source region of the Yangtze River (SRYR) is located in the hinterland of the Tibetan Plateau (TP). The natural environment is hash, and the hydrological and meteorological stations are less distributed, making the observed data are relatively scarce. In order to overcome the impact of lack of data, the China Meteorological Forcing Dataset (CMFD) was used to correct the meteorological data, to make the data more closer to the real distribution on the SRYR surface. This paper used the Soil and Water Assessment Tool (SWAT) to verify interpolation effect. Since the SRYR is an important water resource protection area, have a great signicance to study the hydrological response under future climate change. The Back Propagation (BP) neural network algorithm was used to integrate data extracted from the six Global Climate Models (GCMs), and then the SWAT model was used to predict runoff changes in the future status. The results show that the CMFD data set has a high precision in the SRYR, and can be used for meteorological data correction. After the meteorological data correction, the Nash-Sutcliffe eciency increased from 0.64 to 0.70. Under the future climate change, the runoff in the SRYR shows a decreasing trend, and the distribution of runoff during the year changes greatly. This reects the amount of water resources in the SRYR will be decreased, which will brings challenges to water resources management in the SRYR.


Introduction
In general, due to the lack of observation sites in the plateau mountainous areas, studies can be di cult to carry out or interrupted for the lack of data (Lai et al. 2019;Li et al. 2018). However, due to its unique natural geographical environment and with less impact from human activities, the plateau mountainous areas are ideal zones for studying various hydrological laws (Cruz et al. 2014;Gao et al. 2016;Sun et al. 2019). The analysis of snow melt runoff, changes in ecological environment carrying capacity, hydrological response under climate change, and the interpolation correction of meteorological data etc have become hot issues in the study of the plateau mountainous areas (Jain et al. 2010;Khadka et al. 2014;Salzmann et al. 2014;Stewart and Prowse 2010). The TP formed by geological structures and known as the Earth's Third Pole, with an average elevation of more than 4,000 m a.s.l.. It has typical land form features of the plateau mountainous areas (Royden et al. 2008;Yao et al. 2012). The glaciers and snow are widely distributed in the TP, and are the source of many large rivers in Asia, including the Yangtze River, the Yellow River and the Ganges River, etc (Zhang et al. 2013). Affected by the obstacles of the mountains, the water vapor ow in the monsoon season can only traced along the river valley, making the precipitation seriously affected by the terrain (Zhou et al. 2012). In the Yarlung Zangbo River valley in the southeastern part of the TP, the average annual precipitation can reach 1,000-3,500 mm. While in the Qilian Mountains and Kunlun Mountains in the northwestern part of the TP, the average annual precipitation is only 100-500 mm, where are hardly affected by the monsoon climate (Kai et al. 2014;Lin et al. 2015;Yang et al. 2007). Because the precipitation is affected by many factors in the plateau mountainous areas, there have great differences of precipitation in different regions. Furthermore, the measured stations are scarcity, adding di cult to calculate accurate average surface precipitation (Durán et al. 2015;Ji and Chen 2012;Xiao-Bo et al. 2009). Even the precipitation recorded at sites with different elevations can be quite different (Cuo and Zhang 2017). Accurate meteorological data input is an important guarantee for the hydrological model to obtain better simulate results. In order to overcome the shortcomings of the measured data, a variety of satellite remote sensing meteorological products have been developed, and the main popular products are Tropical Rainfall Measuring Mission (TRMM), Climate Prediction Center MORPHing technique (CMORPH), Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), etc (Leroy 2015;Papadavid et al. 2009;Shari et al. 2016;Zeng et al. 2012;Zeweldi and Gebremichael 2009). Furthermore, through the combination of remote sensing data and measured data to develop new data sets, such as the CMFD, the China Meteorological Assimilation Driving Datasets for the SWAT (CMADS) (Chen et al. 2011;He and Yang 2011;Meng 2016). The precision of these data is usually higher than the original remote sensing data, and have a great application value in areas lacking measured data.
In a long term, the climate environment has its own changing laws, while human activities can affect the climate change trends . The global warming problem caused by excessive greenhouse gas emissions has received widespread attention around the world (Lu and Cai 2009;Yoo et al. 2010).
Extreme weather events have occurred more frequently in recent years, such as extreme drought events in the southwestern China in 2010, extreme rainstorm events in Beijing in 2012, and extreme low temperature events in the Midwestern United States in early 2019 (Sario et al. 2013). The plateau mountainous areas are more fragile and vulnerable to climate change. The increase in temperature has caused some serious environmental problems, such as the sharp decline of glaciers, the melting of frozen soils, and the deterioration of the ecological environment in the plateau mountainous areas (Lan et al. 2015;Xia et al. 2011;Yong et al. 2015). Therefore, the study of hydrological response in the plateau mountainous areas under climate change is very important, for it can provides theoretical support for addressing regional water resources management issues.
In this study, rstly compared the measured meteorological data with the extracted CMFD data in the SRYR. Then used the CMFD to correct the meteorological data, and used the SWAT model to verify the interpolation effect. By employing the BP neural network algorithm to integrate meteorological data of the six GCMs, and combined with the SWAT model to analyze the hydrological response in the future climate environment. It provides a new reference method for the exploration of hydrological laws in the plateau mountainous areas where are lacking of measured data.

Study area
The SRYR refers to the basin above the Zhimenda hydrological station (Fig. 1). The whole area is in the territory of Qinghai Province, China, and located in the center of the TP. It is roughly between 90°43′-96°45′ E and 32°30′-35°35′ N, with an area of 137,800 km 2 , and the average elevation is about 4,500 m a.s.l. (Liu et al. 2009). There are many glaciers widely distributed in the basin, with an area about 1,300 km 2 (Zhang et al. 2008). Most of the upstream tributaries originate from the glaciers of the Tanggula Mountains ( Fig. 1a). The natural environment of the SRYR is hash, and the land form types are mainly bare land and plateau sparse grassland (Fig. 1a, b and c). The natural environment in the lower reaches of the basin is relatively mild, with a small amount of forest distributed. In terms of climate, the SRYR has a typical plateau continental climate and belongs to the semi-humid and semi-arid regions of the plateau subfrigid zone. The average annual temperature is between − 1.7 to 5.5°C, July is the highest temperature period, the average temperature can reach 9.7°C, and the average temperature in January is the lowest, around − 13°C. Generally, the temperature is affected by the altitude so the temperature in the downstream usually higher than that in the upstream. The upstream area is in the positive temperature only from June to September, and long-term or seasonal frozen soil is widely distributed in there. The precipitation in the SRYR is mainly formed by water vapor tracing along Chin-sha River valley from the Bay of Bengal, and partly from the central part of the TP northward to the SRYR. The average annual precipitation is between 200 to 500 mm in the SRYR, and is mainly concentrated from May to September, accounting for 85-95% of the annual precipitation. The precipitation is mainly concentrated in the lower reaches of the southeastern part of the SRYR, with less precipitation in the northwest, the form of the precipitation in the most of the time is solid, such as snow or hail.
2.2 Spatial data A digital elevation model (DEM) with a resolution of 90m was downloaded from the United States Geological Survey (USGS) ftp site in the format of Geo Tiff (Fig. 1). In the SWAT model, the DEM data is used for watershed division and river channel data generation. The land use data was downloaded from Resource and Environment Data Cloud Platform of China, and the land use types were reduced into six categories (Fig. 2a). The soil data was extracted from the Harmonized World Soil Database (HWSD), and the soil types were reclassi ed into ve categories (Fig. 2b), then created user's own soil database. These three kinds of spatial data are needed for building the SWAT model in the SRYR, and were uniformly projected into the WGS_1984_Albers coordinate system.

Meteorological and hydrological data
There are four meteorological stations distributed in the SRYR (Fig. 1), their data include daily precipitation, and daily mean, maximum and minimum temperature, were downloaded from China Meteorological Administration (CMA). The data recording period was from 1963 to 2016. The CMFD was downloaded from Cold and Arid Regions Science Data Center, China. The data set merges a variety of data sources, such as TRMM satellite precipitation analysis data (3B42), Princeton forcing data, GLDAS data. The data quality has been greatly improved compared to a single data set, and has a high degree of credibility. In this study, the monthly precipitation and temperature data from 1979 to 2015 were extracted from the CMFD. The discharge data was obtained from the Zhimenda hydrological station at the outlet of the SRYR (Fig. 1), which records the daily discharge data from 1957 to 2016.

Climate data
In order to simulate the hydrological response in future climate scenarios, six Global Climate Models (GCMs) with better prediction effect in China were selected, they are BCC_CSM1.1M, CanESM2, IPSL-CM5A-LR, NorESM1-M, GFDL-ESM2M and MPI-ESM-LR. These GCMs under scenario Representative Concentration Pathways 4.5 (RCP4.5) were downloaded from Coupled Model Intercomparison Project Phase 5 (CMIP5) data sets (Bellenger et al. 2013;Knutti and Sedláček 2013). The period from 1963 to 2016 was treated as historical, and the period from 2020 to 2100 was treated as the future period. The monthly precipitation and temperature were extracted from the six GCMs, then used BP neural network algorithm to integrate these forecast data.

Meteorological data correction
The CMFD and GCMs are large spatial data sets, in this research, used the program to extract the data from the 56 grid points with a spatial resolution of 0.5° * 0.5° in the SRYR (Fig. 3). The SRYR was divided into four zones (1 zone, 2 zone, 3 zone, 4 zone) by equidistant grading between meteorological stations (Fig. 3). The data of the grid points in each zone were used to calculate the average surface precipitation and temperature of the zone. Through comparing the data of the grid point with the nearest meteorological station (Yushu with the point 8, Ulan Moron with the point 24, Qumarleb with the point 31 and Wudaoliang with the point 47) to analyze the quality of the CMFD data set.

Hydrological model
The SWAT model is a distributed hydrological model based on physical mechanism, which is widely used all over the world (Parajuli et al. 2010;Zhang et al. 2009). It is powerful enough to simulate hydrological cycles, contaminant migration, soil erosion, changes in water quantity and quantitative analysis in different water resources management modes, and prediction of the future hydrological changes, etc. The SWAT model uses DEM data to divide a large watershed into different Hydrological Response Units (HRUs) according to land use type, soil type and terrain slope. Independent hydrological calculations are performed in each HRU (Easton et al. 2010;Zhou et al. 2013). Daily precipitation, maximum/minimum air temperature are basic climatic variables for SWAT model input, although the model has own weather generator can generate these data, the measured meteorological station data can improve the simulation accuracy. The SWAT model has a snow melt runoff module, which has strong applicability to areas with high snowfall, such as the plateau mountains areas or high latitudes areas. By comparing the average daily temperature with the threshold temperature, the SWAT model classi es precipitation as rain or freezing rain/snow. The snowfall is stored in the form of a snow pack on the ground. Snow melt is controlled by the air temperature, the snow pack temperature, the snow melting rate, and the snow cover area. The multi-module (such as, surface runoff module, evapotranspiration module, soil water module) combination enables the SWAT model to simulate hydrological cycles under various features, and is conducive to the improvement of the model or the addition of new functions.

Integration method
The GCMs are global coverage of climate prediction data sets with coarse spatial resolution. Since different GCMs are developed by different organizations, the algorithms used are also different, making the performance of the data sets also different. Even the same GCM has different simulation capabilities in different regions. In this study, in order to combine the advantages of different GCMs, the BP neural network algorithm was used to integrate the simulation results of the six GCMs. The BP neural network algorithm has the advantages of non-linear mapping capability, parallel distributed processing capability, self-learning and self-adaptive ability, and strong data fusion capability. The BP neural network algorithm has the following relationship between input and output: Where x x 1 …x m T is input vector, y is the output result, w i is weight coe cient, θ is the threshold, f(x) is the excitation function, which can be a linear function or a nonlinear function.

Evaluation criteria
In order to evaluate the quality of the CFMD data set and the model performance, the volume balance error (R E ), the correlation coe cient (r 2 ) and the Nash-Sutcliffe e ciency (E ns ) were chosen as the evaluation criteria.
Where V sim is the simulation value, V obs is the observed value, `V sim is the mean simulation value, `V obs is the mean observed value.

Meteorological data correction analysis
The measured precipitation and temperature of the four meteorological stations and the 56 points of precipitation and temperature extracted from the CMFD data set were interpolated separately for the multi-year average from 1979 to 2015 (Fig. 4). The precipitation and temperature interpolation results of the four meteorological stations can roughly re ect the precipitation and temperature distribution, but they are insu cient in the details (Fig. 4a, c). The 56 points of precipitation and temperature interpolation results using the CMFD data set are more detailed and can re ect local details of each region of the SRYR (Fig. 4b, d). The mean surface precipitation of the SRYR obtained by interpolation of the four meteorological stations is 358 mm, and mean surface temperature is -2.2°C, while the mean surface precipitation obtained by interpolation of the 56 points of the CMFD data set is 395 mm, and mean surface temperature is -5.1°C. The interpolation mean surface precipitation has little difference, while the mean surface temperature has great difference.
The statistical analysis between the observed precipitation and temperature by the meteorological stations with the precipitation and temperature data extracted from the CMFD data set is shown in Table   1. The correlation coe cients (r 2 ) between the observed precipitation and temperature of the four stations with the CMFD data set are very high, both above 0.95. The Nash-Sutcliffe e ciency coe cients (E ns ) are also all above 0.95, except for the temperature of the Yushu and Qumarleb stations. From the perspective of the volume balance error coe cients (R E ), the differences in precipitation are large while the temperature are relatively small. The observed precipitation and temperature of the four stations are highly consistent with the precipitation and temperature changes extracted from the CMFD data set, except the observed temperature at the Yushu and Qumarleb stations are slightly higher than the temperature extracted from the CMFD data set (Fig. 5). Which is also the reason for the lower temperature Nash-Sutcliffe e ciency coe cients of the two stations. It is worth noting that the grid points and the meteorological stations are not coincide in geographical position, so there are inevitable errors. However, from the veri cation results, the CMFD data set have high precision and can be used for interpolation of meteorological data.
The SRYR is divided into four zones (Fig. 3), and the average surface precipitation and temperature of each zone are calculated by using the points CMFD data set, and site observed data falling in each zone. Set up four virtual sites using the center point and mean elevation of each zone ( Table 2). The average surface precipitation and temperature of each zone are basically consistent with the measured precipitation and temperature trends of the meteorological stations in the zone. However, there are still differences in details, such as maximum precipitation, maximum temperature, etc.  (Table 3), the simulation results are relatively satisfactory, indicating that the SWAT model has applicability in the SRYR. The simulated runoff of the calibration and validation periods are basically consistent with the observation runoff. While the recession process during the validation periods are different with the actual process, and in some years, the runoff peaks were not simulated out (Fig. 7). The simulated monthly runoff from 1980 to 2015 with the four meteorological stations recorded data is called before correction, while the simulated monthly runoff from 1980 to 2015 based on the meteorological data corrected by the CMFD data set is called after correction. In these two scenarios, the Nash-Sutcliffe e ciency coe cients are 0.64 and 0.70, respectively (Table 3). After the meteorological data correction, the volume balance errors (R E ), the correlation coe cients (r 2 ) and the Nash-Sutcliffe e ciency coe cients are both relatively improved, however the increase is not very large (Table 3). In both scenarios, the simulated runoff and the observed runoff are basically the same, however the simulated runoff after the meteorological correction had better simulation results in the recession process and the runoff peak (Fig. 8). The precipitation and temperature extracted from the six GCMs are integrated using the BP neural network algorithm (Fig. 9). Among them, the observed historical data from 1979 to 2015 are used as training data, then the data in the future from 2020 to 2056 are predicted by the six GCMs data. It can be clearly seen from the Fig. 9 that the integrated historical data are highly consistent with the observed historical data. The integrated future data and the observed historical data are smoothly connected, indicating it can reasonably predict future climate change. Except the Yushu station, the precipitation of the other three stations are on the rise, meanwhile the temperature of the four stations showed an increasing trend, while the growth rate slowed down compared with the historical temperature.
The predicted future climate data were used as the input data of the SWAT model to predict future runoff changes in the SRYR (Fig. 10a). The average annual runoff of the history is 429.88 m 3 /s, while the average annual runoff in the future is 378.76 m 3 /s, which shows a signi cant drop. The historical runoff is in a rise trend, while the future runoff shows a slight decrease. The standard deviation (σ) of historical and future average annual runoff are 123.06 m 3 /s and 125.99 m 3 /s, respectively, indicating that the future runoff variation range will increase. In the future years (2020-2056), the average annual precipitation and temperature have increased compared with the historical  values, while the precipitation has not increased too much, the temperature has increased signi cantly (Fig. 10b, c). Furthermore, the change trends and ranges have getting smaller.
From the perspective of the distribution of runoff during the year, the future runoff shows an increasing trend from October to April, while the runoff from May to September shows a decreasing trend (Fig. 11a).
The distribution change of the future precipitation is not obvious compared to the distribution of the historical precipitation during the year, and there can not nd an signi cant change law (Fig. 11b). The temperature distribution change during the year is more obvious, for the monthly average temperature in the future is higher than the historical average temperature (Fig. 11c). It can be seen that under the future climate scenario, the temperature has a greater impact on runoff. In particular, the temperature increase in winter time has accelerated the melting of glaciers and snow in the SRYR, further more makes the runoff increased in the winter time. Although the average annual precipitation has increased slightly, the temperature has increased signi cantly, which led to an increase on evapotranspiration in the SRYR, making the runoff decreased. Under the future climate scenario, the water resources in the SRYR have changed greatly in quantity and distribution.

Discussion
Unlike in the plain areas, the meteorological data in the plateau mountainous areas are affected by the terrain (Lai et al. 2019;Li et al. 2018). Blocked by mountains, water vapor is generally easy to transport upstream along the valley (Zhang et al. 2013). There is more precipitation on the slopes on the both sides of the river valley, and less precipitation on the leeward hillsides, this can also make the vegetation cover different (Fig. 12). However, due to the harsh conditions in the plateau mountainous areas, the distribution of meteorological stations is low. So, it is necessary to consider that the data recorded by the meteorological sites are accurate. Therefore, there may be have a certain error in using observed data as a measure. More meteorological stations are needed to calculate the surface average meteorological data to ensure the data are closer to the real situation of the basin in the plateau mountainous areas.
Glaciers and snow melt runoff in the plateau mountainous areas have a greater impact on hydrological balance. In particular, the in uence of glacial melt water on runoff is di cult to calculate in hydrological model. Moreover, the correlation between the precipitation and runoff in the plateau mountainous areas is generally not high (Fig. 13), indicating that runoff is effected by the other water sources. These make it di cult to signi cantly improve the hydrological model simulation effect even if the input meteorological data are in nitely close to the true values. There is a bottleneck value which is hard to break through.
Climate change will have an impact on all aspects of the basin, and the hydrological cycle is also affected by changes in all aspects of the basin. Simply changing one or several variables to predict hydrological changes may have a certain bias.

Conclusions
In this study, the accuracy of the CMFD data set was rstly veri ed, and then the data set were used for meteorological correction. Secondly, the SWAT model was used to verify the meteorological data interpolation effect. Finally, the integration results of the six GCMs were used to predict the hydrological responses in the SRYR under future climate change. The main conclusions are as follows: (1) The precipitation data extracted from the CMFD data set are highly consistent with the observed precipitation data, and compared with the observed data from the four meteorological stations the E ns are all above 0.95. The consistency of the temperature data is slightly worse, but the accuracy is still high, and the interpolation requirements are met.
(2) The SWAT model has a good applicability in the SRYR, the E ns in the calibration and validation periods are all above 0.6. After the correction of meteorological data, the simulation effect of the model has been improved to some extent, the E ns reaches to 0.70. However, due to the proportion of glacial snow melt runoff in the SRYR, it is di cult to greatly improve the simulation effect.
(3) Under the future climate change, the runoff in the SRYR is decreasing. In the distribution of runoff during the year, the runoff in the winter half year increased signi cantly, while in the summer half year the runoff decreased signi cantly. These indicate that the SRYR is developing in the direction of aridi cation, which will brings certain challenges to the management of water resources in the SRYR.
Declarations Figure 1 The location and 3D surface view of the upstream (a), midstream (b) and downstream (c) of the SRYR. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 2
Classi cation of the land use (a) and soil types (b) data in the SRYR. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. The 0.5° * 0.5° latitude and longitude grid points and zonal division in the SRYR. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 4
The precipitation and temperature interpolation of the meteorological stations (a, c) and 56 points of the CMFD data set (b, d). Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 6
Comparison between the corrected precipitation and temperature with the observed precipitation and temperature (the Ulan Moron with Zone 1 a, a'; the Wudaoliang with Zone 2 b, b'; the Qumarleb with Zone 3 c, c'; the Yushu with Zone 4 d, d').

Figure 8
Monthly runoff simulation results of before meteorological correction and after meteorological correction with the observation data.

Figure 9
The precipitation and temperature integrated by the BP neural network algorithm in the four meteorological stations (the Ulan Moron: a, a'; the Wudaoliang: b, b'; the Qumarleb: c, c'; the Yushu: d, d').

Figure 11
The historical  and future (2021-2056) mean runoff (a), precipitation (b) and temperature (c) distribution during the year.