Long-Term Joint Operation of Cascade Reservoirs Using Enhanced Progressive Optimality Algorithm and Dynamic Programming Hybrid Approach

Dynamic Programming (DP) is one of the most classical methods adopted for reservoir operation. It reduces the computational efforts of complex high-dimensional problems by piecewise dimensionality reduction and provides the global optimums of the problems, but it suffers the “curse of dimensionality”. Progressive Optimality Algorithm (POA) has been used repeatedly in reservoir operation studies during last decades because it alleviates the “curse of dimensionality” of DP and has good convergence and extensive applicability. Nonetheless, the POA encounters two difficulties in multireservoir operation applications. One is the transfer interrupt problem that makes the search procedure hard to achieve free allocation of water between two non-adjacent stages, and another is the dimensionality problem that leads to a low convergence rate. In order to overcome these deficiencies, this study made some enhancements to the POA and proposed a hybrid approach combining the Enhanced POA and DP (EPOA-DP) for long-term operation of cascade reservoir systems. In EPOA-DP, the EPOA is employed to improve the quality of the solutions and DP is used to reduce the computational effort of the two-stage problem solution. The proposed approach was tested using a real-world four-reservoir cascade system and a ten-reservoir benchmark test example. The results demonstrated that it outperforms the POA both in computational time and quality of the solution.


Introduction
Joint operation of cascade reservoirs is an important way to improve water and hydropower utilization efficiency on basin scale. As hydraulic connections and compensatory relationships between various reservoirs are very complicated in a cascade system, joint operation process of cascade reservoirs is a complex high-dimensional constrained optimization problem which is hard to get the globally optimal solution. Numerous classical methods, such as Linear Programming (LP) (Su et al. 2020), Non-linear Programming (NLP) (Pericaro et al. 2020), Dynamic Programming (DP) (Zeng et al. 2019;Rani et al. 2020) and DP's variants (Jiang et al. 2018;Gong et al. 2021;He et al. 2021) and intelligent algorithms (Jahandideh-Tehrani et al. 2020;Liu et al. 2020) were applied to joint operation of cascade reservoirs. In contrast to other methods, DP and its variants are used more extensively in engineering practice because they can easily handle non-linear as well as non-differentiable factors and are able to produce satisfactory solutions in most cases. DP is one of the most classical and popular methods employed for reservoir operation problems. It produces the globally optimal solutions under a given magnitude of discretization, but suffers the "curse of dimensionality" in that the memory and computational requirements grow exponentially with the number of reservoirs. The dimensionality problem makes DP inapplicable to multireservoir operation.
In order to alleviate the "curse of dimensionality", a sequence of improvements have been made to DP and various variants have been developed. Larson and Korsak (1970) proposed the Dynamic Programming Successive Approximations (DPSA) technique which reduced the number of reservoirs to optimize with respect to at a time for DP by decomposing a multireservoir problem into a sequence of single reservoir ones. Heidari et al. (1971) provided the Discrete Differential Dynamic Programming (DDDP) approach whose most attractive feature is that it lessens the number of states for DP by iteratively searching in constantly changing corridors. Howson and Sancho (1975) presented the Progressive Optimality Algorithm (POA) which in contrast to DP is characterized by the fact that it decreases the number of stages to focus on at a time by transforming a multi-stage problem into a sequence of two-stage ones. Bai et al. (2015) and Zhang et al. (2016) applied the hybrid approaches combining the POA and DPSA to multireservoir operation. The hybrid approaches inherited the advantages of both methods. Cheng et al. (2014) and He et al. (2019) solved multireservoir operation problems using the Parallel Computing technique, which lessened the execution time of the search procedure. Feng et al. (2020a) applied the Latin Hypercube Sampling technique to optimization of a cascade reservoir system, which reduced the computational burden of DP.
As an improved version of DP, the POA decreases the computational effort required by DP via reducing time dimensions. Since the POA optimizes only with respect to two stages at a time, the computational efficiency is improved significantly in contrast to DP. Howson and Sancho (1975) proved that the POA produces the globally optimal solution when the objective function of the problem is convex. For these reasons, it has been widely used in multireservoir operation studies. Nevertheless, it encounters two difficulties in multireservoir operation applications: the transfer interrupt problem and the dimensionality problem (Feng et al. 2020b). These limit its performance in multireservoir operation problems.
In this study, some enhancements were made to the POA and a hybrid approach combining the Enhanced POA and DP (EPOA-DP) was developed to improve the computational efficiency of the search procedure as well as the quality of the solution for long-term operation of cascade reservoir systems. A real-world four-reservoir cascade system and a ten-reservoir benchmark test example were used to verify the feasibility and validity of the proposed approach. Due to our focus on improving the performance of the search procedure, the stochasticity of inflows was not considered in the model for simplifying calculations.

Objective Function
The object for long-term operation of cascade reservoir system is to maximize the benefits from the system over the entire operating period. Assuming that a cascade system consists of M reservoirs that are numbered 1, 2, ⋯, M sequentially from upstream to downstream and the operating period consists of T same time periods. Thus, the objective function can be written as: where B denotes the total benefits from the cascade system over the entire operating period; V i, t-1 denotes the storage in reservoir i at time period t-1; R i,t denotes the release from reservoir i at time period t; I i,t denotes the inflow to reservoir i at time period t; b i (V i,t-1 , R i,t , I i,t ) denotes the benefit function for reservoir i at time period t. Usually, R i,t is a function of V i,t-1 , V i,t and I i,t . Thus, Eq. (1) can be written as: where g i (V i,t-1 , V i,t , I i,t ) denotes the benefit function for reservoir i at time period t.

Rationale of Prototypical POA
In order to alleviate the "curse of dimensionality" of DP, Howson and Sancho (1975) presented a principle of progressive optimality based on Bellman's principle of optimality (Gross 2016), which states that: "The optimal path has the property that each pair of decision sets is optimal in relation to its initial and terminal values." Based on the above theory, the POA was developed by Howson and Sancho (1975) so that the multi-stage problem of reservoir operation was decomposed into a sequence of two-stage subproblems. The most essential feature of the POA is that the procedure advances toward the global optimum by successively and iteratively solving the two-stage problems which can be written as: where F denotes the benefits from the system over time periods t and t + 1; V i,t-1 's and V i,t + 1 's are known fixed values; V i,t 's are variables to optimize; I i,t 's and I i,t + 1 's are determined by V i,t-1 's, V i,t 's and V i,t + 1 's through Eq. (3). An iteration is a complete sweep of the two-stage problem solution from t = 1 to T-1. The POA always takes the computational results of the current two-stage problem as the initial conditions of the next one. After several iterations, the accuracy requirement will be satisfied and the algorithm will locate a solution. The implementation details are introduced step-by-step as follows: Step 1: Initialize the state vector Step 2: Fix the state set {V i,j |i = 1, ⋯, M; j = 0, 1, ⋯, t-1, t + 1, ⋯, T} and optimize the one {V i,t |i = 1, ⋯, M} until the maximization in Eq. (7) is achieved. After that, s i 's are updated with new state values of the reservoirs. This process is executed in order from t = 1 to T-1.
Step 3: Iteratively execute Step2. When the difference between the state vectors at two adjacent iterations satisfies the specified accuracy requirement, stop the iteration.

Transfer Interrupt Problem
The POA improves the quality of solutions by redistributing water between a pair of stages. For certain form of objective functions, such as convex functions, the POA can achieve free allocation of water between arbitrary two stages (especially two non-adjacent stages) by successively solving several two-stage problems. Howson and Sancho (1975) proved that the POA converges to the globally optimal solution when the objective function of the problem is convex. However, for some other forms of objective functions, the POA suffers the transfer interrupt problem and cannot ensure the global convergence. Here an empirical analysis is carried out to illustrate this belief. Let us focus on a three-stage optimization problem for single reservoir. It is presumed that V 0 and V 3 , the initial and final storages in the reservoir, are both equal to V t , the storage lower limit of the reservoir which is taken to be 5 units; Q 1 , Q 2 and Q 3 , the inflows to the reservoir, are taken to be the same value of 1 unit; L 1 , L 2 and L 3 , the evaporation losses of the reservoir, are negligible; V t , the storage upper limit of the reservoir, is 10 units; R t and R t , the release lower and upper limits of the reservoir, are taken to be 0 and 5 units respectively; the objective function is max (2R 1 + R 2 + 3R 3 ). This completes the specification of the problem.
Apparently, the optimal release policy for this problem is (0, 0, 3) and the optimal objective function value is 9. Now give the POA an initial policy (1, 1, 1) and the corresponding objective function value is 6. Starting at the initial policy, the POA will finally locate the policy (2, 0, 1) and the corresponding objective function value is 7. Obviously, the convergence policy is not the optimal one. The search process of the POA can be briefly described as follows: Starting at the initial policy (1, 1, 1), the POA first transferred 1 unit of water from stage 2 to stage 1 by solving the two-stage problem in connection with stages 1 and 2 so that the policy (2, 0, 1) produced. After that the water transfer was interrupted because any amount of water being transferred from one stage to its neighbor stage will lead to a decrease in objective function value. As a result, the POA converged to the policy (2, 0, 1) instead of optimal one.
Comparing the POA's convergence policy (2, 0, 1) to the optimal policy (0, 0, 3), it is not difficult to find that in order to produce the optimal policy the POA just needs to move 2 units of water from stage 1 to stage 3 on the basic of the convergence policy. But this cannot be achieved due to the transfer interrupt problem.
This case demonstrated that using the POA cannot achieve free allocation of water between two non-adjacent stages in some cases.

Dimensionality Problem in Two-Stage Problem Solution
The POA gets the optimal state combination of various reservoirs via evaluating the objective function values corresponding to every state combination in the two-stage problem solution. We might as well take time period t as an example to analyze the computational complexity of the POA for the two-stage problem solution. For clarity, assuming that arbitrary reservoir i (i = 1, 2, ⋯, M) has N discrete states at time period t and let n V i,t , where n = 1, 2, ⋯, N, denotes the nth discrete state. Then a complete state of the cascade reservoir system at time period t can be represented as ( n1 V 1,t , n2 V 2,t , ⋯, nM V M,t ) where n 1 , n 2 , ⋯, n M = 1, 2, 3, ⋯, N. Considering different discrete states for every reservoir, altogether N M combinations are required to evaluate in order to find the optimal one. Therefore, the computational complexity of the POA for the two-stage problem solution is O(N M ). It can be found that the POA suffers serious dimensionality problem by increasing the number of reservoirs.

Enhancements to Prototypical POA (EPOA)
Let us focus on the two-stage problem for cascade reservoir systems shown in Eq. (7). Taking cascade reservoirs 1, 2, ⋯, i as a whole for water balance calculation, we have: where W i denotes the sum of releases from reservoir i at time periods t and t + 1. As Q j,t , Q j,t + 1 , L j,t , L j,t + 1 , V j,t-1 and V j,t + 1 are known, W i can be calculated. That is, the sum of R i,t and R i,t + 1 keeps constant for any reservoir i (i = 1, 2, ⋯, M) in the two-stage problem solution of the POA. Thus, Eq. (7) can be written in following equivalent form: where V i,t-1 's and W i 's are known; R i,t 's are variables to optimize; V i,t 's, I i,t 's and I i,t + 1 's are determined by V i,t-1 's and R i,t 's through Eq. (3).
Eq. (9) shows that the essence of solving the two-stage problems of cascade reservoir systems using the POA is to achieve the optimal allocation of W i between time periods t and t + 1. That is, the POA always optimizes the values of R i,t 's and R i,t + 1 's while fixing those of R i,1 's, ⋯, R i,t-1 's, R i,t + 2 's, ⋯, R i,T 's at a time in order to achieve the maximization in Eq. (9).
In general, we extended the above principle to arbitrary two time periods such as t 1 and t 2 , t 2 = t 1+ k, t 1 = 1, 2, ⋯, T-1, k = 1, 2, ⋯, T-t 1 (i.e. let the procedure optimize the values of R 0 i;t 1 's and R 0 i;t 2 's while fixing those of R i,1 's, ⋯, R i;t 1 −1 's, R i;t 1 þ1 's, ⋯, R i;t 2 −1 's, R i;t 2 þ1 's, ⋯, R i,T 's in the two-stage problem solution). Thus, we have: where E denotes the benefits from the system over time periods t 1~t2 ; V i;t 1 −1 's, V i;t 2 's, R i,j 's and W i 's are known; R i;t1 's are variables to optimize; V i,j-1 's, V i;t2−1 's, I i;t1 's, I i,j 's and I i;t2 's are determined by V i;t1−1 's, V i;t2 's, R i;t1 's and R i,j 's through Eq. (3). Thus, free allocation of water between two non-adjacent stages is reachable by solving the problem shown in Eq. (10).

DP Model for Subproblem Solution of EPOA
In the prototypical POA, reservoir storage V i,t is taken as the state variable in the two-stage problem solution. Thus, the "curse of dimensionality" emerges because of a large number of * redundant calculations. By analyzing the characteristics of cascade reservoir systems, it is not hard to find that if we take reservoir release R i,t as the state variable in the two-stage problem solution of the POA, the dimensionality problem can be avoided and the computational efficiency can be improved. Thus, a DP model for the subproblem solution of the EPOA was developed as follows: (1) Stage variable Let the sequence number of cascade reservoir, namely i, be the stage variable. As the EPOA needs to optimize the values of R i;t1 's for M reservoirs in solving the subproblem shown in Eq. (10), the total number of stages is M.
(2) State variable The reservoir release R i,t is used as the state variable. It is found that the optimal path after stage i, namely ðR * iþ1;t 1 ; R * iþ2;t 1 ; ⋯; R * M ;t 1 ), is determined by R i;t1 , the reservoir state at stage i, and has nothing to do with the path to R i;t 1 before stage i, namely ðR 1;t 1 ; R 2;t 1 ; ⋯; R i−1;t 1 ). That is, the selected state variable satisfies the non-after-effect property of Markov chain.
Thus, a complete state for reservoir i can be written as ðR i;t 1 ; R i;t 1 þ1 ; ⋯; R i;t 2 −1 ; W * i −R i;t 1 ) where R i;t1þ1 , ⋯, R i;t2−1 are taken to be some fixed values. For clarity, let ðR i;t1 ; W * i −R i;t1 ) briefly express the state of reservoir i. Now, we undertake a discretization of the state interval [R i;t 1 ÀÀ; W * i −R i;t2 ÀÀ ] and constrain R i;t 1 to a set of N values. Thus, the number of states for a stage is N. (

3) Decision variable
Simultaneously, R i,t is taken as the decision variable.
(4) State transfer equation According to Eq. (3), the state transfer equation can be written as: (5) Recurrence equation We have the recurrence equation as follows: where f * i−1 R i−1;t1 À Á is the optimal benefit function of remaining stages (stages 1 to i-1).
The detailed process for the subproblem solution of the EPOA is shown in Fig. 1 which shows that the DP model transforms the M-dimensional subproblem of the EPOA into a Mstage 1-dimensional DP problem. As N 2 water balance calculations are required for a stage, the computational complexity of EPOA-DP for the subproblem solution is O(MN 2 ).

Flow Diagram of EPOA-DP
To start the algorithm, initial values are assigned to each parameter as well as variable and an accuracy or precision limit which is the basis of the stopping rule is defined. Generally, the initial values of variables are random generated or gained by experience. After this initialization, the algorithm runs as illustrated in Fig. 2.

Extension of EPOA-DP for Mixed Multireservoir Systems
EPOA-DP applies to cascade reservoir systems and cannot be used directly for a general mixed multireservoir system. It is found that any mixed multireservoir system can be decomposed into several cascade reservoir systems and any operation problem of cascade reservoir system can be solved using EPOA-DP. Hence a hybrid approach combining EPOA-DP and the Successive Approximations technique (Larson and Korsak 1970) (EPOA-DP-SA) was developed for mixed multireservoir systems. For simplicity but without loss of generality, we take the mixed multireservoir system shown in Fig. 3 as an example to briefly introduce the EPOA-DP-SA method. The system consists of 3 reservoirs where reservoirs 1 and 2 are first connected in parallel and then in series with reservoir 3. The system can be decomposed into two cascade subsystems, namely subsystems 1-3 and 2-3. The implementation details of EPOA-DP-SA are introduced step-by-step as follows: Step 1: Initialize the release set of reservoirs, i.e., {R i,t |i = 1, 2, 3; t = 1, ⋯, T}.
Step 4: Iteratively execute Step2 and Step3. When the difference of release trajectories at two adjacent iterations satisfies the specified accuracy requirement, stop the iteration. In general, the implementation details of EPOA-DP-SA for a mixed multireservoir system that contains m cascade subsystems are as follows: Step 1: Initialize the release set of reservoirs {R i,t |i = 1, ⋯, M; t = 1, ⋯, T}.
Step 2: Optimize the release trajectory of subsystem k (k = 1, ⋯, m) using EPOA-DP while fixing that of the remaining reservoirs (not contained in subsystem k). This process is executed in order from k = 1 to m.
Step 3: Iteratively execute Step2. When the difference of release trajectories at two adjacent iterations satisfies the specified accuracy requirement, stop the iteration.

Application to Four-Reservoir Problem
A cascade system consisting of four reservoirs (reservoirs A, B, C and D in order from upstream to downstream) was taken as an illustrative example to test the performance of EPOA-DP. The characteristic parameters for the various reservoirs are listed in Table 1. The operating period is one year consisting of 12 time periods (months). The object is to maximize the annual power generation from the system as: where ƞ i is the energy conversion efficiency of reservoir i; q i,t is the flow through the turbines for reservoir i at time period t; H i,t is the water head of reservoir i at time period t. The constraints include the water balance equation, the upper and lower limits of reservoir storage as well as release, the boundary conditions and the upper and lower limits of power output which can be written as: where P i,t is the power output of reservoir i at time period t; P i;t ÀÀ and P i;t ÀÀ are the upper and lower limits of power output for reservoir i at time period t, respectively. Other constraints are the same as Eqs.
(3)-(6). A simulating compute was conducted during 10 scenarios (years) using the POA, the EPOA and EPOA-DP, independently. The initial trajectory was produced by controlling reservoir releases so that the storages in each reservoir stayed the same. The objective function value corresponding to the initial trajectory was 70.883 (10 8 kW•h). The procedures were performed on a Pentium 2.0 MHZ PC and the optimization indexes derived from each method for N = 41 are given in Table 2.
As shown in Table 2, EPOA-DP and the EPOA produced almost equal power generations of the cascade system during 10 scenarios and the absolute values of relative deviations were less than 0.09% (the minor deviations, we believe, were caused by discretization). However, the execution time of EPOA-DP for each year was found much less than that of the EPOA, about 0.12% of the latter on average. These suggest that the DP model can distinctly lighten the calculation burden while ensuring the overall convergence in the subproblem solution of the EPOA. Compared to the POA, the power generation derived from EPOA-DP for each year Fig. 3 Sketch for a simple mixed multireservoir system had a certain increase but the corresponding execution time had a remarkably decrease. The average growth in power generation was about 0.75% and the biggest and second-biggest growths were 2.37% and 2.00%, respectively appearing in the 5th and 2nd years. The execution time of EPOA-DP for each year was much less than that of the POA, about 0.60% of the latter on average. These indicate that EPOA-DP can effectively improve the computational efficiency as well as the quality of the solution of the POA. The return, as a function of the number of iterations, for the 5th year is plotted in Fig. 4 and the corresponding execution time, as a function of N, is plotted in Fig. 5.
As shown in Fig. 5, the time curve of EPOA-DP was below those of the EPOA and the POA with a negligible rising trend. On the contrary, the time curve of the EPOA was above those of the other two methods and ascended sharply. Unsurprisingly, the time curve of the POA occupied the center region and went up relatively rapidly. On further analysis, it was found that the computational complexity of either the EPOA or the POA grew as N 4 but that of EPOA-DP grew as N 2 . As N increased further, for example to 201, the execution time of either the POA or the EPOA was more than 24 h but that of EPOA-DP was just about 17 s on the Pentium 2.0 MHZ PC.

Application to Ten-Reservoir Problem
A classical ten-reservoir problem was taken as another example to test the performance of EPOA-DP-SA for mixed multireservoir systems. The problem was first formulated and introduced to the literature by Murray and Yakowitz (1979) and more recently solved by Wardlaw and Sharif (1999) and Ahmadianfar et al. (2021). This problem is complicated not only in terms of size, but also because of many time-dependent constraints on storage. The tenreservoir system comprises reservoirs in series and parallel, as shown in Fig. 6 wherein a reservoir may receive supplies from one or more upstream neighbors.
The operating period is one year and the object is to maximize the power benefits from the system over 12 months. The objective function can be written as: where c i,t denotes the benefit coefficient of reservoir i at time period t. The values of c i,t 's were given by Murray and Yakowitz (1979). Inflows are defined for each of the upstream reservoirs and initial as well as final storages in various reservoirs are specified at the beginning of the operating period. In addition, the lower and upper limits of storage and release for each reservoir are not allowed to break. The inflow and constraint parameters are available in the work of Murray and Yakowitz (1979). The water balance equation can be written as: where U i denotes the set of reservoirs which directly supply reservoir i. The problem is beyond the capacity of DP and is difficult with the POA and the EPOA, but is relatively simple with the simplex method. From the simplex method, it was known that the global optimum for the problem is 1194.44. As shown in Fig. 6, the system contains 6 cascade subsystems, namely subsystems 1-7-10, 2-4-7-10, 3-4-7-10, 5-7-10, 6-7-10 and 8-9-10. EPOA-DP-SA was employed to solve the problem and POA-SA (Zhang et al. 2016) was used for a comparison purpose. The initial trajectory for either method was taken from the work of Murray and Yakowitz (1979) wherein it was termed "the static release policy" and the objective function value for it was 1080.9836. The procedures were performed on the Pentium 2.0MHZ PC. The return of EPOA-DP-SA was 1194.21, about 99.98% of the global optimum, and the execution time was 29.4 s for N = 201. At the same magnitude of discretization, the return of POA-SA was 1175.45, about 98.41% of the global optimum, and the execution time was 1.1 s. In contrast to POA-SA, the return of EPOA-DP-SA increased by 1.60% but the execution time only increased by 28.3 s. Although the global optimum was not obtained, the return of EPOA-DP-SA was pretty inspiring. Figure 7 displays the returns of EPOA-DP-SA and POA-SA after a given number of iterations.
Overall, given the size of the problem, the results of EPOA-DP-SA are satisfactory.

Conclusions
As a well-known improved version of DP, the POA suffers the transfer interrupt problem and the dimensionality problem. It fails to provide satisfactory scheduling results within a reasonable execution time and memory use. Thus, to overcome these defects, a hybrid approach EPOA-DP was proposed for optimization operation of cascade reservoirs. First an enhanced version of the POA Fig. 5 Execution times of the POA, EPOA and EPOA-DP for the 5th year Fig. 6 Sketch for the ten-reservoir system (EPOA) was developed to achieve free allocation of water between any two stages. Then a DP model was built to deal with the dimensionality problem in the subproblem solution of the EPOA. The test results show the superior performance of the proposed approach. The proposed approach is conceptually simple and easy to implement for large scale cascade reservoir operation problems. It provides an avenue toward overcoming the "curse of dimensionality" in multireservoir control.
Availability of Data and Material The datasets used or analyzed during study are available from the corresponding author on reasonable request.

Authors' Contributions
The whole work was independently accomplished by JC, including conceiving the approach, designing and performing the experiments, analyzing the data, interpreting the results and writing the paper.

Declarations
Ethics Approval Not applicable.

Consent to Participate Not applicable.
Consent for Publication Not applicable.

Conflict of Interest
No conflict of interest.