Runoff forecast and analysis of the probability of dry and wet transition in the Hanjiang River Basin

Runoff prediction plays an important guiding role in water resources planning and management, flood and drought prevention. As the Hanjiang River Basin (HRB) is the water source of the middle line of the South-to-North Water Diversion Project, it has higher requirements for water resources accurate prediction. In order to analyze the prediction capabilities of different prediction methods for the HRB runoff, this study constructed 12 prediction models to simulate and predict the runoff of four hydrological stations in the HRB. Furthermore, the Markov Chain Monte Carlo (MCMC) method was used to analyze the transition probability of runoff from low-to-high (high-to-low). The results showed that the runoff of four hydrological stations in the HRB all showed a downward trend, and most of the sudden changes occurred in the 1980 s. The smoother the runoff changes, the easier it is to make accurate prediction. Among the 12 models, the quadric spline Markov forecasting model (QS-MFM), moving average Markov forecasting model (MA-MFM), Markov forecasting model (MFM), deep neural networks (DNN), and cubic exponential smoothing (CES) methods have stronger generalization ability and can more accurately predict the runoff of the HRB. The average relative error during the validation period is 0.27, 0.28, 0.33, 0.34 and 0.39, respectively. The logistic model can accurately simulate the change of runoff status in the HRB. The wet threshold of Baihe (BH), Huanglongtan (HLT), Huangjiagang (HJG), and Huangzhuang (HZ) is 819.9 m3/s, 207.4 m3/s, 1313.9 m3/s and 1681.7 m3/s, and the dry threshold is 480.4 m3/s, 130.6. m3/s, 817.8 m3/s and 1083.4 m3/s, respectively. The MCMC method can accurately estimate the parameters of the logistic model, and the low-high (high-low) runoff transition probability model constructed in the HRB can accurately calculate the low to high (high to low) runoff transition probability.


Introduction
From a long-term observation, the regional climate environment has a certain degree of continuity and stability, which is conducive to medium and long-term prediction of the regional climate environment Nygren et al. 2020;Yaduvanshi et al. 2021). There are also certain regular changes in runoff in a specific river basin, so runoff sequences can be predicted, which is of great significance to regional water resources management, drought and flood prevention (Ma et al. 2020;Shappell et al. 2021;Tian et al. 2018;Turunen et al. 2020). The rapid development of social economy has continuously increased the consumption of water resources, which has increased the contradiction between the imbalance of water supply and demand (Deng et al. 2020;Dong et al. 2021;Ferrucci and Vocciante 2021). In northern China, water shortages have become a major factor restricting regional economic and social development (Cheng et al. 2019;Li et al. 2017). In order to meet the water demand in the north, China uses the South-to-North Water Diversion Project to transport water to the north, and the Hangjiang River Basin (HRB) is the source of water for the middle route of the South-to-North Water Diversion Project (Feng et al. 2011;Guo et al. 2020;Qu et al. 2020;Yu et al. 2020). Cross-regional dispatch of water resources requires more refined water resources management, and reasonable water resources prediction has an important guiding role in the rational use of water resources Wang et al. 2015;Xie et al. 2019).
Hydrological prediction has important guiding significance for water resources management and flood control, and has extensive research in the field of hydrology (Piotrowski and Napiorkowski 2012). At present, the commonly used methods for hydrological forecasting include physical cause analysis methods, mathematical statistics, intelligent algorithms, and comprehensive forecasting methods based on numerical weather prediction (Badrzadeh et al. 2015;Liu 2014;Ouyang et al. 2007;Pan and Wang 2004). However, due to the influence of many factors such as climate change and human activities, the runoff of the basin showed fluctuations and increased inconsistency, these make the runoff forecast face great uncertainty (Moosavi et al. 2017;Wu 2018;Xiu-fen et al. 2003). Different runoff prediction methods are suitable for different scenarios, and the prediction methods are also constantly being developed and improved (Löwe et al. 2014). The numerical statistical method is based on the change characteristics of the runoff sequence itself. Its predictive ability decreases with the increase of the forecast period, but its calculation amount is small and flexible (Chua and Wong 2011;He et al. 2020). Model-based runoff prediction requires the construction of a regional hydrological model. Different hydrological models have different requirements for input parameters and variables, and their calculation workload is usually larger (Archer and Fowler 2008;Doycheva et al. 2017). The neural network based on machine learning has strong generalization ability for non-stationary series, so it also suitable for runoff prediction (Ghumman et al. 2011;Parida et al. 2006;Sedki et al. 2009).
The Hanjiang River is the main water source in the Jianghan Plain, and also the water source for the Middle Route of the South-to-North Water Diversion Project . Accurate hydrological forecasting of the HRB is beneficial to the security of water resources in this region (Doycheva et al. 2017;Moosavi et al. 2017). Mathematical statistical models are widely used in simulation and forecasting of time series due to their relatively simple structure and strong applicability (He et al. 2019). However, due to their weak generalization ability, the uncertainty of simulation will increase as the forecast period increases . The Markov chain can improve the generalization ability of the model and effectively improve the simulation accuracy of the model (He et al. 2019). This study constructed 12 prediction models to predict the runoff of four hydrological stations in the HRB and compared the advantages and disadvantages of the different models. On this basis, the conversion mechanism between dry and wet in the HRB was also analyzed by Markov chain Monte Carlo (MCMC) method. This study not only has a theoretical contribution to the prediction methods, but also provides an important reference for the HRB runoff prediction, drought and flood prevention.

Study area and data
The Hanjiang River is the largest tributary of the Yangtze River Basin (YRB), with a total length of more than 1,500 km and a drainage area of 1.59 9 10 5 km 2 . The climate of the HRB (106°12 0 -114°14 0 E, 30°08 0 -34°11 0 N) has obvious characteristics of subtropical monsoon climate. The climate is relatively mild, with an average annual temperature between 15 and 17°C. Precipitation in the HRB is relatively abundant, with an average annual precipitation of 600-1300 mm. The precipitation is mainly concentrated in the summer half year, accounting for more than 70% of the annual precipitation (Zhou et al. 2017). Among them, June, July, and August are particularly prominent, accounting for about 40-50% of the annual total precipitation. The runoff main source of the Hanjiang River is rainwater, followed by groundwater. Groundwater recharge accounts for about 15-20% of the annual runoff.
The runoff of the HRB varies greatly from year to year, and the maximum annual runoff is more than three times to the minimum runoff. The annual average runoff of the whole basin is about 60 billion m 3 . Due to abundant precipitation, water resources in the HRB are abundant. However, there are differences in the spatio-temporal distribution of water resources in the HRB. In order to control and develop the water resources, more than 2700 large, medium and small reservoirs have been built in the HRB. The Danjiangkou Reservoir, which is the water source for the middle route of the South-to-North Water Diversion Project, opened in October 2014. Affected by climate change and human activities, water resources in the HRB have shown a downward trend in recent years. Figure 1 illustrates the topography, water system and hydrological stations distribution of the HRB. It can be seen that the topography of the HRB is high in the west, and low in the east and south. The river network in the HRB is dense, and the Baihe (BH) and Huangzhuang (HZ) stations are the demarcation points of the upper, middle and lower reaches of the basin. The hydrological stations are distributed in the upper, middle and lower reaches. Except for Huanglongtan (HLT) station, which is on the tributary Zenghe, BH station, Huangjiagang (HJG) station, and HZ stations are all on the main stream. The runoff data used in this study was recorded by the four hydrological stations from 1960 to 2014.

ARIMA and SARIMA
Time series refers to a series of numbers with the same statistical indicator in the order by their occurrence. The main purpose of time series analysis is to find the futures based on the recorded historical data. There are four commonly used time series models: Autoregressive model (AR(p)), Moving average model (MA(q)), Autoregressive moving average model (ARMA(p, q)), and Autoregressive Integrated Moving Average model (ARIMA(p, d, q) ). It can be said that the first three are special forms of the ARIMA(p, d, q) model (Khan et al. 2020). The ARIMA model is established on the basis of a stationary time series, so the stationarity of the time series is an important prerequisite for modeling. The method of testing the stability of the time series model generally employs the ADF unit root test model. Of course, if the time series is unstable, it can be turned stable through certain operations (such as taking the logarithm, difference), and then the ARIMA model is performed to obtain stable time series forecast results. Finally, the inverse operation is performed on the prediction results (such as the inverse operation of the difference) to get the prediction results of the original data. The ARIMA model can be expressed as follows: where L is the lag operator, p is the betweenness of the autoregressive process, d is the betweenness of the difference, and q is the betweenness of the moving average process. u is the parameter of the autoregressive part of the model, X is the time series, h is the parameters of the moving average part, and is the error term. The Seasonal Autoregressive Integrated Moving Average (SARIMA) model adds seasonal factors on the basis of the ARIMA model and is suitable for unstable data with trend cycles (Carmona-Benítez and Nieto 2020). In addition to seasonal changes, this cycle can also be caused by other factors.

Exponential smoothing method
The exponential smoothing method is a special weighted moving average method, which strengthens the effect of recent observations on the predicted value. The weights assigned to the observations at different times are not equal, and the weight of the recent observations is increased, which can change the changing rate of the curve according to the weight coefficient. It does not abandon the past data, and only imposes a gradually weakening degree of influence, that is, as the data moves away, it gives a weight that gradually converges to zero. It is a time seriesbased forecasting method, which is generally divided into Fig. 1 The topography of the HRB and its distribution of hydrological stations single exponential smoothing (SES), second exponential smoothing (SEES) and cubic exponential smoothing (CES) (Dong et al. 2013;Smyl 2020). The calculation formula of the SES is as follows:

DNN
Neural network technology originated in the 1950 s, when it was called a perceptron, with an input layer, an output layer, and a hidden layer. The input feature vector reaches the output layer through the hidden layer transformation, and the classification result is obtained in the output layer. Deep Neural Networks (DNN) can be understood as a neural network with many hidden layers, also known as deep feedforward network (DFN) and Multi-Layer perceptron (MLP) (He et al. 2019;Yu et al. 2021). DNN is an emerging algorithm in the field of machine learning in industry and academia in recent years. DNN have significantly improved the simulation capability of nonlinear data and have been widely used in many fields.

Markov chain
In nature and social phenomena, some random phenomena follow Markov process, that is, the state of the system or process at t n can determine the state of the system or process at t nþi . It has nothing to do with the state of the system or process before t n . For the random process fX t ð Þ; t 2 Tg, the Markov process expression is as follows: A Markov chain (MC) is a random process with Markov properties, a special form of Markov process, which is discrete in state and time (Xiang et al. 2021). The MC has no aftereffect of Markov process, indicating that its subsequent state is only related to the current state and has nothing to do with the previous state. The expression formula of MC is as follows: where E ¼ fE 0 ; E 1 ; . . .; E n g is the state space set of the random process. The formula for calculating the n-step state transition probability is as follows: For all integers n ! 0 and i; j 2 I, the transition probability P n ð Þ ij of n steps has the following properties: In this study, based on the original Markov chain, the relative error is used as the state process of the Markov chain for state estimation, and the fuzzy set method is used to convert interval predictions into point predictions.

MCMC
The Markov Chain Monte Carlo (MCMC) method was produced in the early 1950 s. It is a Monte Carlo method that is simulated by a computer under the framework of Bayesian theory (Reuschen et al. 2020). This method introduces the Markov process into the Monte Carlo simulation, realizes the dynamic simulation in which the sampling distribution changes with the simulation, and makes up for the traditional Monte Carlo integration that can only be statically simulated. The MCMC is a simple and effective calculation method, which is widely used in many fields, such as statistics, Bayesian problems, and computer problems. The logistic regression model was used as a model to study the transition probability of dry and wet in the HRB, and the MCMC method was used to find the optimal parameters of the logistic regression model (Meyers et al. 2021). The logistic regression function is as follows: where a is the position parameter and b is the scale parameter. Under different parameter combinations, the function image is shown in Fig. 2. 3 Results Figure 3a1, b1, c1, and d1 depict the change state of runoff sequence of the four hydrological stations respectively. It can be seen from the trend line that the runoff of the four hydrological stations has a downward trend, and the runoff of the HJG station has the fastest decline rate, reaching -6.484 m 3 /sÁa -1 . It shows that there is a decreasing trend of water resources in the HRB. Figure 3a2 and a3 are the wavelet power spectra of the runoff at the BH station. It can be seen that the runoff at the BH station has a 3-4 years oscillation period, and the global wavelet spectrum (GWS) exceeds the 95% confidence level. There is also an oscillation period of about 8 years, but it is not significant. Figure 3b2, b3 are the wavelet power spectra of the runoff at the HLT station. It shows that the runoff at the HLT station has an oscillation period of about 4 years, and the GWS exceeds the 95% confidence level. While the other oscillation periods are not significant. Figure 3c2, c3 are the wavelet power spectra of runoff at HJG station. The runoff at HJG station has oscillation periods of about 4 years and 8 years at the same time, and the GWS exceeds the 95% confidence level. Figure 3d2, d3 are the wavelet power spectra of HZ station runoff. It can be seen from the figure that there is an obvious oscillation period of about 7-8 years, and the GWS exceeds the 95% confidence level while the other oscillation periods are not significant.

Runoff sequence analysis
Seasonal-Trend decomposition procedure based on loess (STL) is a common algorithm in time series decomposition. Based on locally weighted scatterplot smoothing, the data at a certain time is decomposed into trend component, seasonal component and residual component. Figure 4a1, a2, a3, and a4 are the STL decomposition results of the BH station runoff. From Fig. 4a2, it can be seen that the runoff was relatively stable before 1985. After 1985, the runoff began to decrease until the runoff began to rise again in 1998. It can be seen from Fig. 4a3 that the runoff at the BH station has a cyclical change of about 4 years, and the Fig. 2 The shape of the logistic regression function under different parameter combinations change trend has been intensified in recent years. After removing the trend and periodic changes of the runoff sequence, the residual is relatively stable (Fig. 4a4). Figure 4b1,b2,b3, and b4 are the STL decomposition results of runoff at HLT station. From Fig. 4b2, it can be seen that the runoff at the HLT station fluctuated before 1986, and after 1986 the runoff began to decrease continuously. The runoff of the HLT station has a cyclic variation of about 3-4 years (Fig. 4b3). From Fig. 4b4, it can be seen that the residual fluctuations are relatively large, indicating that the runoff sequence of the HLT station has large fluctuations. Figure 4c1, c2, c3, and c4 are the STL decomposition results of runoff at HJG station. From Fig. 4c2, it can be seen that the runoff of HJG station also continued to decrease after 1986, and then the runoff has a slight upward trend after 2000. From Fig. 4c3, it shows that the runoff at HJG station also has a cyclic variation of about 3-4 years. It can be seen that after removing the trend and periodic changes of the runoff sequence, the residual fluctuation is small (Fig. 4c4). Figure 4d1, d2, d3, and d4 are the STL decomposition results of HZ station runoff. From Fig. 4d2, it can be seen that the runoff at the HZ station has continued to decrease after 1990. The runoff at the HZ station also has a cyclic variation of about 3-4 years (Fig. 4d3). After removing the trend and periodic changes of the runoff sequence, the residual fluctuations are relatively large, indicating that the HZ station runoff has large fluctuations (Fig. 4d4).

Forecast based on runoff continuity
The SES, SEES, CES, ARIMA, SARIMA, and DNN methods were used to predict runoff by fitting and extending the original sequence reasonably. The period of 1960-2011 was taken as the calibration period, and 2012-2014 as the validation period. Autocorrelation function and partial autocorrelation function graphs are often used in time series analysis and forecasting. They describe the strength of the relationship between a time series observation and its previous observations. The parameters p and q in AR(p) and MA(q) models is determined by partial autocorrelation function (PACF) and autocorrelation function (ACF) respectively. Figure 5a1 and a2 is the ACF and PACF of the BH station runoff, respectively. From the figure, it can be seen that the runoff has a high correlation with the runoff of the previous step, the correlation exceeding the significance level of 95%, so p was taken as 1, q was also taken as 1. Figure 5b1 and b2 is the ACF and PACF of HLT station runoff, respectively. It can be seen from the figure that the runoff has little correlation with the previous runoff, and the correlation does not exceed the 95% significance level, so p is taken as 0, q as 0. Figure 5c1 and c2 is the ACF and PACF of HJG station runoff, respectively. The runoff and the previous runoff have the largest correlation within one time step, and the correlation exceeds the 95% significance level, so p was taken as 1, q as 1. Figure 5d1 and d2 is the ACF and PACF of HZ station runoff, respectively. It can be seen from the figure that the runoff has little correlation with the previous runoff, and the correlation is less than the significance level of 95%, so p was taken as 0, q as 0. Figure 6 illustrates the runoff simulation results in calibration period  of BH, HLT, HJG, and HZ hydrological stations using SES, SEES, CES, ARIMA, SARIMA, and DNN methods. Seen from left to right, Fig. 6a1 to d3 are the simulation results of runoff at four hydrological stations by SES, SEES, and CES respectively. It can be seen from the figure that SES has a better effect on runoff simulation, while SEES and CES are too flat for runoff process simulation and are not sensitive enough to capture the runoff changes. Figure 6a4 to d5 are the simulation results of runoff at four hydrological stations by ARIMA and SARIMA methods. It can be seen from the figure that ARIMA and SARIMA can better simulate the change of runoff, and SARIMA can better simulate the change of runoff peak value than ARIMA. Figure 6a6 to d6 are the simulation results of runoff at four hydrological stations by DNN method. The DNN method can basically simulate the runoff change process. However, the simulation value for some larger runoff processes is too small. The runoff simulation results of the four hydrological stations from 2012 to 2014 during the validation period are  Table 1, which is convenient for comparison with the simulation results based on the Markov chain method. Table 2 shows the runoff grades classification results of the four hydrological stations by using the mean-value mean squared error classification method. The runoff increases from grade 1 to grade 5. Grade 3 roughly represents the state of a flat water year, grade 1 roughly represents the state of a dry year, and grade 5 roughly represents the state of a wet year. Figure 7 illustrates the runoff of four hydrological stations reaches a stable state through the probability transition matrix in five initial states with the initial probability of 0.2. Figure 7a1 to a5 illustrate the probability of the BH station runoff reaching a stable state in the five steps. It can be seen that the probability of the BH station runoff in grade 1 is about 0.13, the probability in grade 2 is about 0.11, the probability in grade 3 is about 0.41, the probability in grade 4 is about 0.26, and the probability in grade 5 is about 0.09. Figure 7b1 to b5 illustrate the probability that the runoff of the HLT station reaches a stable state in the five steps. It can be seen from the figure that the probability of HLT station runoff in grade 1 is about 0.11, the probability in grade 2 is about 0.14, the probability in grade 3 is about 0.42, the probability in grade 4 is about 0.29, and the probability in grade 5 is about 0.04. Figure 7c1 to c5 illustrate the probability when the runoff of the HJG station reaches a stable state in five steps. From the figure, it can be seen that the probability of HJG station runoff in grade 1 is about 0.13, the probability in grade 2 is about 0.11, the probability in grade 3 is about 0.40, the probability in grade 4 is about 0.23, and the probability in grade 5 is about 0.12. Figure 7d1 to d5 illustrate the probability that the runoff of the HZ station reaches a stable state in the five steps. It can be seen from the figure that the probability of HZ station runoff in grade 1 is about 0.12, the probability in grade 2 is about 0.15, the probability in grade 3 is about 0.40, the probability in grade 4 is about 0.23, and the probability in grade 5 is about 0.10. Based on the original runoff Markov forecasting model (MFM), the relative error between the original runoff and the fitted runoff was used as the original data of the MFM, so that the hidden information in the original runoff data can be used to a greater extent (Corrêa et al. 2020;Stanislavsky et al. 2020;Zhang et al. 2021). Figure Table 1 lists the observed runoff (OR) and simulated runoff results of all methods during the validation period. Table 3 lists the relative error results between the simulated runoff and the measured runoff. Figure 9 is the cumulative histogram of the cumulative absolute error of all methods during the validation period. Combining the three, it can be seen that the more stable the runoff change, the more favorable it is for the prediction. For example, the 2012-2014 runoff change at the BH station is relatively stable. MFM, M-MFM, and QS-MFM all have good prediction results, and the cumulative absolute error is less than 0.5. However, the runoff of HJG and HZ stations in 2014 has a significant decrease compared with the runoff in 2013, which is such a drastic change as most methods cannot predict. On the whole, QS-MFM, MA-MFM, MFM, DNN, and CES have strong generalization ability for runoff prediction, and can adapt to the simulation of drastic changes in runoff. However, the simulation capabilities of ARIMA and SARIME forecasting methods are not stable. For example, ARIMA has good results for BH station runoff simulation, but has poor performance for HLT station runoff simulation. SARIMA has a poor effect on BH station runoff simulation, but has a good result in HLT station runoff simulation. This may be caused by the periodicity and seasonal variation of runoff at different stations. The CS-MFM has the worst simulation effect, and the runoff simulation ability of the four stations is relatively poor. It is worth noting that the same model has differences in the simulation effects of runoff at different stations, so only the models with excellent and stable simulation effects at the four stations can be used for hydrological prediction. Figure 10 shows the fitting results of runoff P-III curve of the four hydrological stations. Generally, the 25% quantile is used as the demarcation point for wet years and the 75% quantile is used as the demarcation point for dry years. The P-III curve was used to extract the high and low runoff boundary points of the four hydrological stations, and then the runoff time series were classified according to these demarcation points. The runoff greater than 25% quantile is classified as a wet year, and less than 75% quantile is classified as a dry year. Figure 11 shows the results by using the MCMC method to optimize the parameters of the logistic model, and then fitting the wet and dry conditions transition model. Among them, where the probability density of parameter distribution is the largest, then the point was selected as the optimal parameter. For example, the a parameter of the logistics model of BH station is 16.7562, and the b parameter is -0.0273. The logistic model with MCMC optimized parameters can simulate the high and low runoff changes well, and it is basically a smooth transition curve from the low-runoff state to the high-runoff state. Table 4 lists the transition probability of runoff from low to high (high to low). It can be seen from the table that the probability of a wet year is 0.29% when the runoff at the BH station is 400 m 3 /s, meanwhile its probability belong to a dry year is 99.71%. It can be concluded that when the flow is less than 400 m 3 /s, it is a dry year at the BH station. The probability of a wet year is 40.61% when the flow is 600 m 3 /s, meanwhile the probability of this year belong to a dry year is 59.39%. This indicates that when the flow is about 600 m 3 /s, it is basically belongs to the state of a normal water year at the BH station. The probability of a wet year is 99.96% when the flow is 900 m 3 /s, meanwhile the probability of this year belong to a dry year is 0.04%. This shows that when the flow is greater than 900m 3 /s at the BH station it has a high probability of being in a wet year. In the same way, the probability of the runoff at HLT, HJG, and HZ stations from low to high (high to low) can be obtained. Among them, the HLT station has the fastest conversion rate of drought and flood, and the flow that generates floods is about 6 times than that of drought. This shows that the smaller the catchment area, the greater the difference in the conversion flow between drought and flood, and the larger the catchment area, the smaller the difference between the conversion flow between drought and flood.

Discussion
The HRB is located at the north-south climate junction of China, and its water resources are severely affected by climate and environmental changes. It is also the water source for the Middle Route of the South-to-North Water Diversion Project (Zhu et al. 2021). The industry in the basin is well-developed and the population is densely populated (Guo et al. 2020). These make the water resources in the HRB are affected by human activities and change in time and space (Li et al. 2017). Affected by climate change and human activities, the runoff of the HRB also shows large fluctuations and even sudden changes. Intensified the inconsistency of the hydrological sequence, which increased the uncertainty of hydrological prediction. Whether runoff prediction is based on the change trend of the runoff sequence or based on the Markov chain, all information that needs is obtained from the original data Yu et al. 2020). If the data has a certain regular change, such as a linear trend change, this will make prediction an easy task. However, if there is a sudden change in the data, it will change the trend of the original sequence, making prediction a difficult task (Feng et al. 2011). Markov chain can minimize the impact of historical data on current data prediction, and is therefore more suitable for simulating watersheds with drastic changes in runoff.
The Mann-Kendall (MK) method was used to test the runoff of four hydrological stations in the HRB to analyze the change state of the runoff sequence (Fig. 12). It can be seen that there are multiple mutation points in the runoff of the four hydrological stations, especially in HLT, HJG, and HZ stations, which makes the prediction of runoff in different periods different. The runoff sequence before and after the sudden change often has different changing trends, which will increase the difficulty of predicting runoff changes (Cheng et al. 2019;Löwe et al. 2014). The method can be tried is to do mutation detection on the runoff series first, and then only use the data after the mutation. However, this will reduce the data that can be used, and will make the data lack representativeness Xie et al. 2019).
Furthermore, the runoff prediction can also based on the teleconnection and considered the climate background Qu et al. 2020). The runoff prediction based on teleconnection will be more towards the real physical basis, which will make the runoff prediction results have a physical basis (Badrzadeh et al. 2015). However, this method needs to find the factors that have the greatest impact on runoff changes, which will increases the difficulty of the operability of the prediction (Reuschen et al. 2020;Wang et al. 2015;Xie et al. 2019). Commonly used factors related to runoff prediction include atmospheric circulation factors, climate change factors, reservoir indexes, land use indexes, etc. Shappell et al. 2021;Zhang et al. 2021). In future study, a runoff prediction model based on teleconnection factors and weather background can be developed in the HRB.
Making the runoff prediction model adaptable to runoff simulation and prediction under the background of climate change and human activities.

Conclusions
In this study, the recorded runoff sequences of four hydrological stations in the HRB were used to simulate and predict runoff through 12 constructed models. Then the MCMC method was used to analyze the low to high (high to low) transition probability of annual runoff in the HRB. It contributes to the theory and application of mathematical statistical methods for runoff prediction, and also important to the prevention and control of droughts and floods in the HRB. The main conclusions can be drawn from this study are as follows: 1. The runoff processes recorded by the four hydrological stations in the HRB are all show different degrees of decline. Among them, the runoff at the HJG station has the fastest decline rate, reaching -6.484 m 3 /s•a -1 . The runoff of the four hydrological stations has an oscillating period of about 3-4 years, and the runoff showed an obvious downward trend in the mid-1980 s. 2. The smoother the runoff changes, the higher the probability for accurate prediction. Among the 12 models, QS-MFM, MA-MFM, MFM, DNN and CES all have strong generalization capabilities and can predict runoff changes more accurately. The prediction ability of ARIMA and SRIMA methods is unstable, and the prediction ability of CS-MFM is the worst.
3. The wet threshold for BH, HLT, HJG, and HZ is 819.9 m 3 /s, 207.4 m 3 /s, 1313.9 m 3 /s and 1681.7 m 3 /s, and the dry threshold is 480.4 m 3 /s, 130.6. m 3 /s, 817.8 m 3 /s and 1083.4 m 3 /s. The MCMC method can accurately estimate the parameters of the logistic model, and the high-low (low-high) runoff transition probability model constructed in the HRB can accurately calculate the low to high (high to low) runoff transition probability.
Although the prediction methods used in this study can simulate and predict the runoff of the HRB accurately, it is based on statistical methods. However, the runoff sequences of the HRB have mutation points, which will affect the accuracy and continuity of the prediction. Therefore, in order to build a more robust runoff prediction model, in the future study it can be considered to add relevant influencing factors such as atmospheric circulation and climate factors to the models to increase the generalization ability of the model.

Declarations
Conflict of interest The authors declare no conflict of interest.