Predicting the Peak Flow and Assessing the Hydrologic Hazard of Kessem Dam, Ethiopia using Machine Learning and RMC-RFA Software

doi:10.21203/rs.3.rs-1746769/v2

Flooding due to overtopping during peak flow in embankment dams primarily causes dam failure. Kessem River watershed of the Awash basin in the Rift Valley of Afar region in Ethiopia has been studied intricately to predict the causes of Kessem dam safety using machine learning predictive models and Risk Management Centre-Reservoir Frequency Analysis (RMC-RFA). Recently developed Recurrent Neural Network (RNN) predictive models with hybrid with Soil Conservation Service Curve Number (SCS-CN) were used for simulation of the river flow. Peak daily inflow to the reservoir is predicted to be 467.72m3/s, 435.88m3/s, and 513.55m3/s in 2035, 2061, and 2090, respectively. The hydrologic hazard analysis results show 2,823.57m3/s and 935.21m, 2,126.3m3/s and 934.18m and 11,491.1m3/s and 942.11m peak discharge and maximum reservoir water level during the periods of 2022-2050, 2051-2075, and 2076-2100, respectively, for 0.0001 Annual Exceedance Probability (AEP). Kessem Dam may potentially be overtopped by a flood with a return period of about 10,000 years during the period of 2076–2100. Quantitative hydrologic risk assessment of the dam is used for dam safety evaluation to decide whether the existing structure provides an adequate level of safety, and if not, what modifications are necessary to improve the dam's safety. Hence, the dam requires further risk analysis study and dam safety modification to control this probable failure mode during the indicated time.

Dam Safety

Hydrologic Hazard Analysis

Kessem Dam

Machine learning

Peak Flow Prediction

Flood is one of the most vulnerable natural disasters caused by peak runoff from the river during the period of extremely high precipitation ravaging the inundated areas [1-3] especially affecting the communities living close to the river [4]. Changes caused due to increased human activities within the river system, climate change, and changes in land use land cover in the upstream part of the watershed, lead to changes in watershed along the downstream [5-7]. The main reason of dam overtopping and river channel overflow is the peak flow occurring in the watershed, which increases the reservoir as well as river channel water level. The erosive action of water flow during overtopping is the primary cause of dam body failure [8]. To effectively control and reduce the effects of these events, the frequency and magnitude of peak flow need to be predicted using advanced techniques [9]. Floods in recent decades have been occurring recurrently in various parts of Ethiopia [10, 11]. For instance, a disastrous flash flood in the downstream caused due to unexpected peak flow supplemented by excessive storage beyond the designed capacity in the Awash basin overwhelmed Meteka town of Ethiopia [11, 12].

[13] discussed on flood prediction using machine learning (ML) models, which was focused on the sate-of the art of ML models in flood prediction and to give insight into the most appropriate models. Particularly investigated to provide an extensive overview on the various ML algorithms used in the field; finally, he concludes that it is the most auspicious prediction method for both long-term and short-term floods. In addition, machine learning models are the most simple, powerful, robust, and predictive models in water systems associated with data-driven techniques [14, 15] for mapping the non-linear relationship between rainfall and runoff, even though they cannot represent the physical process of the catchment [16]. These models establish the relationship between input and output based on data driving techniques and contain no physical transformation function to relate the input to output. Thus, sometimes the direct use of data-driven models does not provide appropriate results as the physical processes are ignored in the model. Those ignored characteristics are the moisture content (AMC) and physical characteristics of the catchment such as geology, soil, slope, and land use/land cover (LULC) conditions [7]. To overcome the above lacunae, a hybrid physically based (SCS-CN) model (it’s able to represent the spatial variability of land surface characteristics such as LULC and soil type) has been incorporated to increase the machine learning model’s performance for the prediction of peak flow.

In addition, in the recent year, forecasting reservoir inflow, instantaneous peak flow, and hydrological risk analysis like; [17-21] were developed/analyzed based on machine learning techniques. [22], estimate the instantaneous peak flow by combining Soil and Water Assessment Tool (SWAT) simulation and machine learning models. In this paper, the forecast peak flow of Kessem dam reservoir by using the selected Machine Learning model and integrated with Risk Management Center-Reservoir Frequency Analysis Software (RMC-RFA) for hydrologic hazard analysis and assessment were conducted. Moreover, recently developed Recurrent Neural Network (RNN) predictive models with hybrid with Soil Conservation Service Curve Number (SCS-CN) were used for simulation of the river flow.

2.1 Description of the Study Area

The Awash basin is one of the major Ethiopian river basins in the Kessem Dam River located in Afar Rift of Afar regional state. Kessem dam, constructed on Kessem river is located at 9^o8’45” Latitude and 39^o55'31" Longitude to store half a billion cubic meters of water to supply the irrigation demand to the agrarian lands of 20,000 ha of irrigation project (Fig.1).

2.2 Data Collection

Hydro-meteorological data (Stream flow data, precipitation and temperature data), digital elevation model (DEM), land use land cover (LULC) and soil map data were collected for this study to predict flood amount and assessment of the risk on dam safety. Daily precipitation data from 1988-2018 years of 15 stations in and around Kessem watershed were collected from Ethiopian Meteorological Agency (EMA). However, 11 rainfall stations were selected for this study based on the availability and continuity of data that contribute to the Kessem River. Climate data such as maximum and minimum temperature for 6 stations were collected from EMA. Twenty-one years from 1990 to 2009 and four years from 2010 to 2013 of daily stream flow data for Kessem River at Aware Melka and Kessem dam stations, respectively, were procured from the Ministry of Water and energy (MoWE) hydrology department. Recorded water levels for Kessem Dam reservoir from 2018 to 2021 were collected from the Ethiopian Construction Works Cooperation (ECWC) dam and irrigation development center. 1 arc second (30 m*30 m) resolution Advanced Space borne Thermal Emission and Reflection Radiometer Global-DEM (ASTER GDEM) was downloaded from United States Geological Survey (USGS) earth Explorer data (https://earthexplorer.usgs.gov/), the Kessem sub-basins of the middle part of Awash were extracted from Awash DEM following the procedural steps of watershed processing to be used for further analysis, more to see (Fig. 2).

2.3 Materials/Tools Used

ArcGIS software was used to analyse spatial data, selected models from Artificial intelligence (Ai) and machine learning models for prediction of flow/flood, Statistical downscaled Model version of SDSM 4.2.9 [23] to downscale climate information from coarse resolution of GCMs to local or site level, Python 3.9 was used the programming language [24] within the Jupyter notebook for writing and running the code, Matplotlib [25] for data visualization, hydrostats packages for evaluation of model performance [26], RMC-RFA software for reservoir routing and analysis of hydrologic hazard on dam, SFE_IFC MATLAB toolbox to determine the flood characteristics (start and end date, peak and duration) and to develop flood hydrograph [27] and Hydrological Engineering Center-Statistical Software Packages (HEC-SSP 2.2) software for volume-frequency analysis [28].

2.4 Methodology

The observed climate data and flow used to calibrate and validate the selected models have been collected from National Meteorological Service Agency of Ethiopian and Ministry of Water and Energy. DEM data and Landsat images were used as input for the reclassified LULC approach processed by Arc GIS, Arc hydro tools for catchment delineation and estimation of catchment characteristics. The course climate data (GCM) downloaded from the Canadian Climate Data and Scenarios (CCDS) portal CanESM2 model outputs for the study area were downscaled into finer spatial resolution at the watershed level by bias correction through SDSM statistical approach and by using the selected potential predictors projecting the future climate (precipitation and temperature). The Landsat images with different bands were downloaded from USGS for the study area and the reclassified in to the LULC class of watershed used supervised classification techniques. The climate projected data under climate change scenario and LULC were used as input for the SCS-CN model to estimate runoff at each sub-watershed outlet. The output of SCS-CN model was used as the input of ML model to predict the flow at Kessem Dam watershed outlet and estimated flood events within future three-time horizons. The results were imported to RMC-RFA software to assess the future hydrological risk of the selected flood events on Kessem reservoir for dam safety evaluation.

Moreover, the overall accuracy and kappa coefficient were calculated by the following equations:

Climate Projection and LULC Changes

Climate Projection at the Future

The climate data (temperature and precipitation) projections in future within the watershed have been studied using CanESM2 climate model for RCP2.6, RCP4.5, and RCP8.5 climate scenarios from the coupled model inter-comparison project-5 (CMIP5) experiments which have been downscaled by statistical downscaling model (SDSM). After trial and error to get the highest model performance by changing the values of bias correction and variance inflation in the SDSM model for precipitation, maximum temperature, and minimum temperature, the statistical results were estimated (Table 1- Table 3), and the mean values of the graphical results were drawn (Fig.3-Fig.5). The selected potential predictors for calibrating the model were ncepp8_ugl, ncepp8_thgl, nceps500gl, ncepshumgl, and nceptempgl, with 1.356 bias correction and 12 values of variance inflation for precipitation, ncepp1_ugl, ncepp1thgl, nceps500gl, nceps850gl, ncepshumgl, and nceptempgl predictors with the values of bias correction and variance inflation of 1 and 12 respectively, were used for the model calibrated for minimum temperature, and nceps500gl, nceps500gl. ncepp1_zgl, ncepp5_fgl, ncepp5_vgl, ncepp500gl, ncepp5thgl, ncepp8_vgl, ncepp8_zgl, and nceptempgl predictors with the values of bias correction and variance inflation of 1 and 12 respectively were used for model calibrating, validating, and testing for maximum temperature.

Table 1: The performance results of SDS model for downscaled precipitation after taking different trial and error

Period		RMSE	NSE	R
Calibration		3.795	0.319	0.584
Validation		1.529	0.309	0.597
Testing	RCP8.5	3.446	0.324	0.625
	RCP4.5	3.429	0.331	0.613
	RCP2.6	3.371	0.353	0.601

After calibrating and validating the model, the statistical evaluated values of RMSE, NSE, and R were found to be 3.446, 0.324 and 0.625, 3.429, 0.331 and 0.613, 3.371, 0.353 and 0.601 respectively for model performance downscaled precipitation during testing period under RCP8.5, RCP4.5 and RCP2.6 climate scenarios respectively. The SDS model super performed during the testing period under RCP2.6 climate scenario with downscaled precipitation, however, this scenario used to project the precipitation data at future time horizons during 2022-2050, 2051-2075 and 2076-2100 in the watershed varies a lot.

Table 2: The performance results of SDS model for downscaled minimum temperature after taking different trial and error

Period		RMSE	NSE	R
Calibration		1.586	0.509	0.722
Validation		1.514	0.582	0.779
Testing	RCP8.5	1.722	0.312	0.662
	RCP4.5	1.697	0.332	0.670
	RCP2.6	1.678	0.346	0.676

The statistical evaluated values of RMSE, NSE, and R were 1.722, 0.312 and 0.662, respectively, under RCP8.5, 1.697, 0.332, and 0.670 respectively, under RCP4.5, and 1.678, 0.346, and 0.676 respectively under RCP2.6 climate scenario (Table 2) during testing period. From this analysis, the SDS model has super performed during the testing period under RCP2.6 climate scenario with downscale minimum temperature, however, this scenario is used to project the minimum temperature data at future time horizons from 2022-2050, 2051-2075 and 2076-2100.

Table 3: The performance results of SDS model for downscaled maximum temperature after taking different trial and error

Period		RMSE	NSE	R
Calibration		1.265	0.489	0.715
Validation		1.179	0.522	0.780
Testing	RCP 8.5	1.443	0.114	0.624
	RCP 4.5	1.418	0.144	0.646
	RCP 2.6	1.429	0.129	0.628

The evaluated values of RMSE, NSE, and R were 1.443, 0.114, and 0.624, respectively, under RCP8.5, 1.418, 0.144, and 0.646 respectively under RCP4.5, and 1.429, 0.129, and 0.628 respectively under RCP2.6 climate scenario (Table 3) for the model performance of downscaled maximum temperature during testing period. From the analysis, the SDS model has super performed during the testing period under RCP4.5 climate scenario with downscale maximum temperature, therefore, this scenario is used to project the maximum temperature data at future time horizons from 2022-2050, 2051-2075 and 2076-2100.

3.1.2 LULC Changes and Scenario of the Future

Analysis LULC map in ArcGIS 10.5 by using the Landsat 8 and Landsat 7 images were downloaded from USGS for path 168, row 54 with different bands at different acquired years (2000, 2010, and 2020). The download images by using supervised classification method with seven different LULC types namely, Agricultural land, Bare land, Forest Area, Grass land, Settlement’s Area, Shrub land, and Water Body for each acquired year were declassified (Fig.6).

Based on the collected sample data confusion matrix (Table 4), the total sample points (TS) are 71, and the total corrected classified (TCS) value is 57, the sum of the product values in the total ground truth column and in the total user row is 1101. Substituting those values in to Equation1 and equation 2, the overall accuracy and kappa coefficient are 80.3% and 0.75, respectively. This implies 80.3% of land use and land cover classes are correctly classified.

Table 4: Creating confusion matrix based on the collected sample of ground truth and user classified

Class Name	AG	BL	F	GL	S	SL	WB	TGT
AG	25	0	1	0	0	1	0	27
BL	0	1	0	3	0	1	0	5
F	0	0	7	0	0	1	0	8
GL	2	0	2	3	1	0	0	8
S	0	0	0	0	6	0	0	6
SL	0	0	1	1	0	10	0	12
WB	0	0	0	0	0	0	5	5
TUC	27	1	11	7	7	13	5	71
Total Samples (TS)								71
Total Corrected Classified (TCS)								57
Overall Accuracy (%)								80.3
Kappa Coefficient (K)								0.75

Note: TGT Total ground truth and TUC- Total user classified

LULC change detection study was performed by the supervised classification method using the maximum likelihood classifier algorithm in ArcGIS 10.5 software during the period 2000-2020. Table 5 shows the changing area covering of each LULC class for the past 20 years.

Table 5: Changes in LULC from 2000 to 2020

Class Name	Areas Covered in 2000 (km²)	Areas Covered in 2010 (km²)	Areas Covered in 2020 (km²)	Change in %/year 2000-2010	Change in %/year 2010-2020	Change in %/year 2000-2020
AG	422.3	541.23	795.97	2.82	4.71	4.42
BL	16.46	15.681	12.414	-0.47	-2.08	-1.23
F	613.6	455.23	540.33	-2.58	1.87	-0.6
GL	241.6	219.87	55.768	-0.9	-7.46	-3.85
S	68.97	80.38	93.197	1.65	1.59	1.76
SL	1614.6	1664.6	1432.9	0.31	-1.39	-0.56
WB	0.004	0.516	46.926	14.24	8.99	6.52

Quantitative analysis of the overall LULC changes, decreases and increases in each class between 2000 and 2020. There is a considerable decrease in Forest (0.6%), grass land (3.85%), and bare land area (1.23%), and shrub lands (0.56%) per year were observed during this period (Table 5). On the other hand, there is an increase in agriculture land (4.42%), settlement area (1.76%), and surface water bodies (6.52%) for the same period. Based on the analysis, the future LULC change scenarios in the Kessem watershed for each class were decided as follows:

Scenario 1: Forest, bare land and shrub land area have been reduced and all grass land areas were covered by agriculture lands, settlement area, and surface water bodies during the period 2022–2050.

Scenario 2: Under this scenario, further reductions have been made in forest, bare land and shrub land area for the period 2051–2075. These will be afterwards covered by agriculture lands, settlement areas, and surface water bodies.

Scenario 3: Reduction has been predicted on the area covered by agriculture lands resulting in the formation of bare land during the period 2076-2100 and in addition forest and shrub lands will get reduced, increasing the settlement area and surface water bodies.

3.2 Flow/ Peak Flow Prediction at the Future Time Horizons

Effective rainfall, potential evapotranspiration [7] and Stream flow data are the main input datasets that were used in ML model. All these datasets used are observation data and estimated data based on observations to calibrate and validate the models.

The effective rainfall was estimated by using SCS-CN model considering the characteristics of sub-watershed of Kessem River. The PET was also computed using Hargreaves method based on the observed maximum and minimum temperature data at each available station. The complete stream flow data for Kessem River at Kessem Dam during the observed period of 1990-2013 were then transformed by using area ratio transform techniques from Aware Melka station to Kessem dam during the period of 1990-2009 and the observed stream flow at Kessem dam during the period of 2010-2013. Based on the climate projection data and LULC scenario, the future flow was predicted by using the performed ML models with the hybrid SCS-CN model. Using the daily time series data, the model was constructed using Kemel Tensor Flow package in Python 3. Training and testing were performed for the period 1990 to 2013, for which observed discharge data are available. In the network modeling, out of the total data, 70% (January 1990-October 2006) were selected for training and 30% (November, 2006 - December, 2013) for testing. Three different deep learning methods (LSTM, Bi-LSTM, GRU) for flow prediction during the historical period were implemented. Prediction models were subsequently applied to predict the flow for calibration and validation periods, and their performance was measured. Daily stream flow to the Kessem Dam reservoir in Kessem watershed was simulated using various deep learning models. The historical observation stream flow data were compared with the computed stream flow from RNN models (LSTM, Bi-LSTM, and GRU) using thirty lag days. A network was attempted to predict outcomes as accurately as possible. The value of this precision in the network is obtained by the cost function, which tries to penalize the network when it fails. The optimal output is the one with the lowest cost. For the applied networks of Mean Square Error [29], the cost function is used. A repetition step in training generally works with a division of training data named as a batch size. The number of samples for each batch is a hyper parameter, which is normally obtained by trial and error. The value of this parameter in all models is 128 in the best mode. In each repetition step, the cost function is computed as the mean MSE of these 128 samples of observed and predicted stream flow. The number of iteration steps for neural networks is named as an epoch and in each epoch, the stream flow time series is simulated by the network like other networks, neurons or network layers can be selected arbitrarily in recurrent networks. For the comparison of models with each other, the structures of all recurrent network models are created identically. In each network, a double hidden layer is used so that there are 12 units in each for the first layer and the second layer. The last layer output of the network at the final time step is linked to a dense layer with a single output neuron. Between the layers, a dropout equal to 10% is used. The structure of the neural network is also used in two hidden layers. The first and second layers have 12 neurons each. In all networks, the sigmoid activation function is applied for the hidden layer. The main advantage of using sigmoid is that, for all inputs greater than 0, there is a fixed derivative. This constant derivative speeds up network learning. Each method is run with different epoch numbers. After taken different trials and errors, the optimal hyper-parameter networks provide the details (Table 6).

Table 6: Optimal hyper-parameter network

Hyper-parameter	Values
Neuron	12
Optimization	Adam
Learning rate	0.001
Activation function	Sigmoid and Tanh
Max Epoch	4000
Batch size	128

The optimized model results were evaluated using Hydrostat packages with statistical error assessment techniques. In Hydrostats, statistical as well as graphical evaluations are made using error metrics function between observed and simulated flow. Graphically by plotting the predicted and observed flow (Fig.7) several descriptive statistics can be used for the evaluation of predictive models. In this study, RMSE, NSE, R² have been purposively used, and the results of different methods based on the evaluation criteria are presented in Table 7. Among the RNN methods, Bi-LSTM performed the best.

Table 7: Statistical evaluation of the performance of ML models

ML Models	Training Period (1990-2006)				Testing (2006-2013)
ML Models	RMSE	NSE	R²	KGE	RMSE	NSE	R²	KGE
BiLSTM	3.873	0.968	0.994	0.702	17.547	0.744	0.749	0.832
LSTM	4.276	0.962	0.974	0.777	19.878	0.672	0.727	0.690
GRU	4.003	0.966	0.982	0.743	20.703	0.644	0.681	0.712

The calculated discharges match well with the observed, as indicated by the high NSE and small RMSE values for the overall evaluation of the three ML models, revealing that the Bi-LSTM models outperform LSTM and GRU. Therefore, in this study, the result of Bi-LSTM model to predict the flow of Kessem River at Kessem dam within three time horizon is used.

3.3 Hydrologic Hazard Analysis for Kessem Dam

3.3.1 Inflow Hydrograph Shape

During 2022-2100, future period of inflow to Kessem Dam, the selected three events on September 2035 from 2022–2050-time horizon, September 2061 from 2051–2075-time horizon and September 2090 from the time horizon of 2076-2100 are the largest peak flow events that may occur in the Kessem dam watershed within the next 100 years.

Inflow hydrograph shapes for the three time horizons shown in Fig.8 were derived using the results from the predicted ML model. September 2035, September 2061, and September 2090 events were used for rescaling the sampled inflow flood events.

The PMF hydrograph was developed based on 9237m³/s (Qp) of the design of the peak inflow PMF for a 10,000-year return period of Kessem dam by using SCS dimensionless methods with a time to peak (Tp) of 33.39 hours, including the watershed characteristics (L = 136.64km = 448,294ft, Sl = 1.238%, CN = 79.41 and Tc = 51.016 hours) to compute discharge Q and the corresponding time t (Fig.9), which depicts the PMF hydrograph that represent the computed value of discharge Q versus time t. This hydrograph was used to compare with the results of peak discharge and stage frequency from hydrologic hazard analysis and to determine whether future flood events on the Kessem dam are at risk or not.

3.3.2 Inflow-Volume Frequency Curve

The developed volume-frequency curve of Kessem dam is based on the Log Pearson Type III distribution with mean, standard deviation, skew coefficient, and effective record length values for the future three time horizons. For the volume frequency analysis, the Bulletin 17C with EMA analysis was performed using HEC-SSP (Table 8).

Table 8: The result of statistics from volume frequency analysis for each future time horizon

Period/Statistics	mean (of log)	standard deviation (of log)	skew (of log)	effective record length
2022-2050	2.446	0.125	0.654	29
2051-2075	2.375	0.107	0.450	25
2076-2100	2.461	0.174	1.087	25

Based on the results of volume frequency analysis, the volume frequency curves within 90% uncertainty bounds were computed for each of the corresponding future time horizons (Fig.10).

3.3.3 Flood Seasonality Analysis

For the threshold value of 206.4m³/s, the frequency sample size during the period 2022-2050, 2051-2075, and 2076-2100 is 35, 31, and 37 respectively. Those are adequate sample size to analyse the flood seasonality for each time horizon. The flood seasonality histogram developed for this analysis for each time frame is presented in Fig.11. According to the result analysis, the annual flows are normal from October through May, with June-September as the wettest months, but the flood seasonal month is August during the period of 2022-2050, while the annual flows normally flow from November through June, with July-October as the wettest month but the flood seasonal month is September during both periods 2051-2075 and 2076-2100.

3.3.4 Reservoir Starting-Stage Duration Analysis

Initial reservoir levels and associated exceedance probabilities were estimated from daily reservoir elevation estimates for the period of record. The duration curve results indicate that the median reservoir elevation for the June through October period is approximately 926m, with a quartile range (25 to 75 percentiles) from about 922 to 928m (Fig.12). This reservoir elevation range was considered as the initial reservoir water surface elevation for routing the hydrographs. From the results, August produces the lowest pool duration curve. From the flood seasonality analysis section, it has been inferred that floods are most likely to occur in August and September. However, the dam is operated with consideration of this flood seasonality. Therefore, large events are most likely to occur in August and September, but they are also most likely to have low reservoir starting pools, mitigating some of the risk for large peak stage events in summer.

3.3.5 Hydrologic Hazard Curve of Kessem Dam

Once RMC-RFA is computed, it automatically creates the Stage-Frequency Curve and Hydrologic Hazard Curve plots (Figs.13 and 14). The median curve represents the uncertainty in stage frequency and peak discharge frequency due to natural variability. The 95% uncertainty bounds represent the uncertainty in stage and peak discharge frequency due to knowledge uncertainty, whereas the expected curve represents the combined uncertainty due to both natural variability and knowledge uncertainty. Those curves are used for semi quantitative risk analysis for Kessem dam.

HHA produces the expected peak discharge and the corresponding peak stage for 100 to 1,000,000 years of return period for each time horizon (Tables 9 and 10), respectively.

Table 9: Expected probable peak discharge (m³/s) for each future time horizons

Return period	2022-2050		2051-2075		2076-2100
Return period	Expected	95 % Bounds	Expected	95 % Bounds	Expected	95 % Bounds
100	867.67	669.3-1168	809.4	621-1082	1,230.9	759-2,308
1000	1,383.95	835-2294.8	1,178.9	705-1796	2,770.7	1,003-8,013
10,000	2,823.57	987-5430.6	2,126.3	790-3468	11,491.1	1,357-17,217
100,000	5,686.2	1,115-10,429.3	3,738.5	851-5811	17,292.3	1,636-17,726
1,000,000	12,916.72	1,147-15,231.7	7,104.67	887-9625	17,733.7	1,900-17,777

Table 10: Expected probable peak stage (m) for each future time horizon

Return period	2022-2050		2051-2075		2076-2100
Return period	Expected	95 % Bounds	Expected	95 % Bounds	Expected	95 % Bounds
100	931.92	931.4-932.6	931.77	931.2-932.4	932.7	931.6-934.5
1000	932.94	931.7-934.5	932.54	931.5-933.7	935.13	932.2-940.9
10,000	935.21	932.1-938.4	934.18	931.7-936.1	942.11	932.8-943
100,000	938.59	932.2-941.8	936.37	931.8-938.7	943	933.3-943
1,000,000	942.42	932.4-943	940.1	931.9-941.6	943	933.7-943

Kessem Dam has a spillway discharge capacity of 6180m³/s at the maximum water surface elevation of 939.5m. Comparing this value with the stage and peak discharge frequency curve, it indicates that the spillway is capable of passing a flood with a return period of 100 -100,000 years for the future time horizon (2022-2075).

During the period of 2022–2050, the expected peak discharge for 1/10000 AEP was equal to 2,823.57m³/s. The 10,000-year peak discharge at 95% confidence upper and lower limits is 987m³/s and 5,430.6m³/s, respectively, from the hydrological hazard analysis. The corresponding expected peak stage for 0.0001 APE is 935.21m, and the lower and upper 95% of the bounds values are 932.1m and 938.4m, respectively. It has not exceeded the PMF discharge of 6180m³/s and the maximum water surface elevation of 939.5m.

During the period of 2051–2075, the expected peak discharge for 1/10000 AEP is equal to 2,126.3m³/s. The peak discharge at 95% confidence upper and lower limits is 790m³/s and 3468m³/s, respectively. The corresponding peak stage for the expected value and 95% lower and upper bounds values of 0.0001 APE is 934.18m, 931.7m, and 936.1m, respectively. The PMF discharge of 6180m³/s and maximum water surface elevation of 939.5m are also not exceeded by those values.

During the period of 2076–2100, the expected peak discharge for a 1/10000 AEP was equal to 11,491.1m³/s. The 10,000-year peak discharge at 95% confidence upper and lower limits is 1,357m³/s and 17,217m³/s, respectively, from the hydrological hazard analysis. The corresponding expected peak stage for 0.0001APE is 942.11m, and the lower and upper 95% of the bounds values are 932.8m and 943m, respectively. It exceeds that from the PMF discharge of 6180m³/s and the maximum water surface elevation of 939.5m.

The results from this initial hydrologic hazard curve characterization and flood hydrograph (Fig.14) routing indicate that Kessem Dam may potentially be overtopped by a flood with a return period of about 10,000 years during the period of 2076-2100. However, this indicates that Kessem Dam does not meet Reclamation hydrologic hazard criteria for overtopping because it does not pass through a PMF for 2076-2100 future time horizons. Therefore, the dam requires further risk analysis study and dam safety modification to control this probable failure mode during the period of 2076 - 2100.

The research predicts the peak flow/flood that may load the reservoir in the future using a machine learning model and develop the HHC for the dam to assess the hydrologic risk for dam safety evaluation of the Kessem dam using the RMC-RFA software. Machine Learning models were used to simulate future stream flow and peak inflow hybrids with the SCS-CN model using the future LULC scenario and the climate projection under the RCP emission scenario. Based on the novel and powerful ML predictive models’ an approach to stream flow simulation based on recurrent neural networks, and inflow volume frequency analysis, the hydrological hazard was assessed using RMC-RFA software for three future time horizons (2022–2051, 2051–2075, and 2076–2100). 70% of the flow data is trained with RNNs, including Bi-LSTM, LSTM, and GRU, and the remaining 30% of the data was used for performance evaluation. For each of the networks, a double hidden layer was used. The number of neurons in the hidden layer, the learning rate, and the number of iterations has a basic role in modeling accuracy, the optimal values of which were obtained using trial and error. Using the dropout function in the network structure prevents network fitting. The efficiency of the proposed predictive approach is evaluated by the simulation of daily inflow to the Kessem Dam reservoir in the Kessem watershed. Results of the study point to the better performance of Bi-LSTM compared to the other RNN architectures. The Bi-LSTM network performs the best and can estimate the stream flow with fairly good accuracy. These networks were used as an effective method of stream flow modeling and predict the peak flow. The performance of the Bi-LSTM model to simulate stream flow for the baseline period (1990-2013) was acceptable and it outperformed other proposed ML models. During the periods of 2022-2100 and 2051-2075, the peak discharge does not exceed the PMF and the overtopping does not occur during this period. The outcome of the semi-quantitative risk analysis showed that for all peak flood potential loadings, the results indicate that Kessem dam may potentially be overtopped by a flood with a return period during the period of 2076–2100, since the peak discharge does not pass through a PMF for 2076-2100 future time horizons and does not meet reclamation hydrologic hazard criteria for overtopping. The dam requires further risk analysis regarding its consequences and damage, and how to modify dam safety to control this probable failure mode, but the dam is safe within the period of 2022–2075. However, all critical upcoming peak flows pass through the dam spillway, which means that the existing spillway capacity is sufficient to surcharge the excess water without overtopping. This research contributes advanced scientific information regarding the future flood to Kessem Dam reservoir and assesses its hazard on dam safety and risk assessment.

Acknowledgement

The authors would like to thank the Ethiopian Ministry of Water and Energy, Ethiopian National Meteorological Agency, Ethiopian Construction Works Cooperation and United States Geological Survey (USGS) for providing essential data for this study. The authors are also thankful to Arba Minch University who has provided all logistics support in completing this research work.

Funding

No funding has been received from any source to conduct the research work.

Conflict of Interests

There is no conflict of interest among the authors in publishing this research article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

1. Getahun, Y. and S. Gebre, Flood Hazard Assessment and Mapping of Flood Inundation Area of the Awash River Basin in Ethiopia using GIS and HEC-GeoRAS/HEC-RAS Model. Journal of Civil & Environmental Engineering, 2015. 5(4): p. 1-12.

2. Khalaf, M., et al., A Data Science Methodology Based on Machine Learning Algorithms for Flood Severity Prediction. IEEE Congress on Evolutionary Computation (CEC), 2018. 7(18).

3. Subramanya, K., Engineering Hydrology. Third ed. 2008, NEW DELHI: Tata McGraw-Hill Publishing Company Limited.

4. Raghunath, H., Hydrology Principles,Analysis and Design Second ed. 2006: New Delhi: New Age International (P) Ltd.

5. Sivapalan, M., et al., Linking flood frequency to long‐term water balance: Incorporating effects of seasonality. Water Resources Research 2005. 41(6).

6. Vo, N., et al., A deterministic hydrological approach to estimate climate change impact on river flow: Vu Gia–Thu Bon catchment, Vietnam. Journal of Hydro-environment Research, 2015. 11: p. 59-74.

7. Mahmood, R., et al., Impacts of land use/land cover change on climate and future research priorities. 2010. 91(1): p. 37-46.

8. Jandora, J. and J. Riha, The Failure of Embankment Dams due to Overtopping. 2008: Václav Houf.

9. Le, X., et al., Application of Long Short-Term Memory (LSTM) Neural Network for Flood Forecasting. Water, 2019. 11(7): p. 1387.

10. Assefa, T., Flood Risk Assessment in Ethiopia. Civil and Environmental Research, 2018. 10(1): p. 35-40.

11. OCHA, Ethiopia: Floods United Nations Ofices for the Coordination of Humanitarian Affairs, Update No.3. 2020.

12. Shumie, M., Evaluation of Potential Reservoir Deficiency Due to Climate Change, Kesem Kebena Dam, Ethiopia. Journal of Environmental Geography, 2019. 12(1-2): p. 33-40.

13. Mosavi, A., P. Ozturk, and K.-w. Chau, Flood Prediction Using Machine Learning Models: Literature Review. Water, 2018. 10(1536): p. 1-40.

14. Mich, L., Artificial Intelligence and Machine Learning. Handbook of e-Tourism, 2020: p. 1-21.

15. Hosseiny, H., et al. A Framework for Modeling Flood Depth Using a Hybrid of Hydraulics and Machine Learning. 2020 [cited 10 1].

16. Kratzert, F., et al., Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrology and Earth System Sciences, 2018. 22: p. 6005-6022.

17. Huang, I.-H., M.-J. Chang, and G.-F. Lin, An optimal integration of multiple machine learning techniques to real-time reservoir inflow forecasting. Stochastic Environmental Research and Risk Assessment, 2022. 36: p. 1541-1561

18. Tian, D., et al., A hybrid framework for forecasting monthly reservoir inflow based on machine learning techniques with dynamic climate forecasts, satellite-based data, and climate phenomenon information. Stochastic Environmental Research and Risk Assessment, 2022. 36: p. 2353-2375.

19. Jimeno-Sáez, P., et al., Estimation of Instantaneous Peak Flow Using Machine-Learning Models and Empirical Formula in Peninsular Spain. Water, 2017. 9(347): p. 1-12.

20. Hong, J., S. Lee, and J.H. Bae, Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow. Water, 2020. 12(2927): p. 1-18.

21. Gabriel-Martin, I., A. Sordo-Ward, and L. Garrote, Granados Hydrological Risk Analysis of Dams: The Influence of Initial Reservoir Level Conditions. Water, 2019. 11(461): p. 1-17.

22. Senent-Aparicio, J., et al., Coupling machine-learning techniques with SWAT model for instantaneous peak flow prediction. Biosystems Engineering, 2019. 177: p. 67-77.

23. Wilby, R. and C. Dawson, SDSM 4.2-A decision support tool for the assessment of regional climate change impacts. User manual. 2007.

24. Rossum, G., Python Tuterial (Report CS-R9526). Retrieved from Amsterdam, the Netherlands. 1995: Amsterdam, the Netherlands.

25. Hunter, J., Matplotlib: A 2d Graphics Environment. Computing in Science & Engineering, 2007. 9(3): p. 90-95.

26. Roberts, W., et al., Hydrostats: A Python package for characterizing errors between observed and predicted time series. Hydrology, 2018. 5(4).

27. Zhang, Q., et al., Automatic procedure for selecting flood events and identifying flood characteristics from daily streamflow data. Environmental Modelling and Software, 2021. 145: p. 1-12.

28. Bartles, M., et al., HEC-SSP Statistical Software Package User's Manual. USACE Hydrologic Engineering Center, 2019.

29. Shoaib, M., A.Y. Shamseldin, and B.W. Melville, Comparative study of different wavelet based neural network models for rainfall–runoff modeling. Journal of Hydrology, 2014. 515: p. 47-58.

No competing interests reported.

Predicting the Peak Flow and Assessing the Hydrologic Hazard of Kessem Dam, Ethiopia using Machine Learning and RMC-RFA Software

Status:

Version 2

Abstract

Figures

1. Introduction

2. Materials and Methodology

3. Results And Discussions