Impact of precipitation extremes on energy production across the São Francisco river basin, Brazil

The Brazilian electrical system (BES) relies heavily on hydrothermal energy, speci�cally hydroelectric power plants (HPPs), which are highly dependent on rainfall patterns. The São Francisco River Basin (SFRB) is a critical component of the BES, playing a key role in electricity generation. However, climate extremes have increasingly impacted energy production in recent decades, posing challenges for HPP management. This study, explores the relationship between extreme precipitation events in the SFRB and two crucial energy variables: Stored Energy (STE) and A�uent Natural Energy (ANE). We analyze the spatial distribution and trends of 11 extreme precipitation indices and investigate the seasonality, trends, and correlations between these energy variables and the extreme indices. Our �ndings reveal downward trends in both ANE and STE. Additionally, we identify a seasonal pattern in�uenced by extreme precipitation rates at various time scales. The results indicate that it is possible to estimate ANE and STE e�ciently by employing three machine learning (ML) algorithms (Random Forest, Arti�cial Neural Networks and k-Nearest Neighbors) using extreme precipitation data. These results offer valuable insights for the strategic planning and management of the BES, aiding in decision-making and the development of energy security.


Introduction
The Brazilian electrical system (BES) has a continental extension.The system is dominated by hydroelectric power plants (HPP's), which use large reservoirs to regulate seasonal river ows (Mendes 2019).The operational planning is sensitive to rainfall variability due to its dependence on hydroelectric generation.While water is a low-cost resource, prolonged rainfall scarcity can lead to energy de cits and the need to apply more expensive sources, such as thermal power plants(Luiz Silva et al. 2019).
The centralized operations of HPP reservoirs are carried out by the National Electric System Operator (ONS), which enables maximization of natural ows utilization, reducing water waste, and minimizing costs.These operations aim to achieve an optimized balance between minimum cost and maximum operational security, ensuring full demand supply (Mendes 2019).
The BES is characterized by the interconnection between the stages of power generation and transmission (Hidalgo et al. 2020).One prominent watershed is the São Francisco river basin.The São Francisco river plays an important role in supplying electrical energy to the Northeast region of the country ([CSL STYLE ERROR: reference with no printed form.]). The total installed capacity in the National Interconnected System (NIS) was 161,526 MW as of 12/31/2018, distributed as follows: 63.7% in HPP's, 14.2% in conventional and nuclear thermal power plants, and 22.1% in small hydroelectric power plants (SHPs), biomass, wind, and solar power plants (Daher and Martinez 2019).
The dilemma is to balance the intensive use of hydroelectric power to avoid energy scarcity during dry periods, and the frequent activation of thermal power plants, which increases production costs and lead to the waste of water during intense rainy periods (Zambom 2008).
Success of the BES operation depends on climatic and hydrological factors.The ONS focuses on the most economical energy generation and system security, considering the storage level and water ow in HPP's as the main variables, represented by Stored Energy (STE) and A uent Natural Energy (ANE), respectively (Mendes 2019).
The ANE represents the producible energy by the power plant from the natural in ows into the reservoirs ([CSL STYLE ERROR: reference with no printed form.]). The corresponding values are presented in average MW or as a percentage of the long-term historical average (LTA).The monitoring and forecasting of ANE volumes are carried out in relation to its historical average veri ed since 1931 ([CSL STYLE ERROR: reference with no printed form.]). The STE represents the energy associated with the amount of water available in the reservoir that can be converted into electrical energy by all power plants.
In the Northeast, it has been observed a signi cant reduction in precipitation extremes (Regoto et al. 2021), resulting in a drier climate and prolonged periods of drought, which is more pronounced during the rainy seasons, intensifying water scarcity-related problems.Although, the water ows can be regulated and an increase in the use of alternative sources is possible, Brazil has faced a series of crises in the water and energy sectors in recent years (Hidalgo et al. 2020).
During the period from 2012 to 2017, the contribution of HPP's to meet the total electricity demand in the Northeast was, on average, only 31%.Furthermore, in November 2015 and 2017, the reservoirs of the São Francisco river reached the lowest level since the construction of the dams in 1994, with only 5% of STE.This situation is directly associated with the occurrence of extreme weather events that result in dry periods in the region (de Jong et al. 2018).
Prolonged drought periods have contributed to intensi cation of events, such as the water crisis of 2014-2015 in the Southeast region, and recurrent drought episodes in the Northeast (Avila-Diaz et al. 2020).
Analyses by (Oliveira et al. 2021) revealed sub-regions vulnerable to intensi ed hydrological and geological processes due to human activity and the recurrence of hydrometeorological phenomena in the São Francisco river basin.
According to (Silveira et al. 2016), studies aiming to identify climate variability in the region contribute to the reformulation of water resource management policies and increase the system resilience in the face of climate change challenges.Additionally, (Lucas et al. 2021) emphasize that the assessment of climate extremes in a speci c area is a fundamental tool for water resource management, decision-making in planning, and formulation of environmental protection policies.Thus, analyses of precipitation extremes can be useful in guiding the formulation of strategies for mitigating climate disasters in areas with high vulnerability (Silva et al. 2022).Certainly, deep investigation on how these precipitation extremes events are connected with the energy system in terms of the different quantities related such as STE ANE is crucial to characterize the temporal impact of climate on the Brazilian electrical system.
Therefore, this study takes the opportunity to analyze whether extreme rainfall climatic events occurring in the São Francisco river basin may impact on energy variables from 2000 to 2019.Moreover, based on machine learning algorithms it is veri ed the potential to deliver a conceptual model which can link climate indices and energy variable.

Study Area
The São Francisco river basin (Fig. 1) covers 8% of the Brazilian territory with a length of 2,863 km and a drainage area of over 639,219 km².Originating in the state of Minas Gerais (Southeastern Brazil), the São Francisco river source at the Serra da Canastra and ows into the Atlantic Ocean, along the borders of the states of Alagoas and Sergipe (Northeastern Brazil).
The Upper São Francisco region comprises the area from the source of the São Francisco river to the city of Pirapora-MG (110,696 km², corresponding to 17% of the basin's surface area).The middle São Francisco extends from Pirapora-MG to Remanso-BA (322,140 km²; 50% of the basin).The Sub-Middle São Francisco hydrographic region covers the stretch from Remanso-BA to Paulo Afonso-BA (168,528 km²; 26% of the basin).Finally, the lower São Francisco encompasses the stretch from Paulo Afonso-BA to the mouth of the São Francisco river (36,959 km²; 6% of the basin).
Among the main reservoirs in the São Francisco river basin, that are used for ow control and/or hydroelectric power generation, the following stand out: Três Marias, located in the state of Minas Gerais, Sobradinho, Paulo Afonso, and Luiz Gonzaga (Itaparica) in Bahia, and Xingó, situated between the states of Alagoas and Sergipe ([CSL STYLE ERROR: reference with no printed form.]). The Paulo Afonso hydroelectric Complex, composed of Paulo Afonso I, II, III, IV, and Apolônio Sales (Moxotó), along with the Xingó hydropower plant, uses run-of-river reservoirs for power generation.In contrast, the other HPP in the São Francisco river basin (Retiro Baixo, Queimado, Luiz Gonzaga, Três Marias, and Sobradinho) operate with ow control reservoirs, which have different characteristics including larger useful volumes than runof-river reservoirs ([CSL STYLE ERROR: reference with no printed form.]).

Energy Data
Regarding the production of electrical energy, data on A uent Natural Energy (ANE) and Stored Energy (STE) were used.Data on ANE and STE for Brazilian hydro energy basins are available on a daily basis through the database of the ONS, with information from 2000 to the present day.These data can serve as input for energy studies.However, the data are part of a recurrent consistency process and may undergo updates.

Climate Data
Two precipitation data sets are used.The rst data set is the Brazilian Daily Weather Gridded Data (BR-DWGD) (Xavier et al. 2022), which consists of daily and monthly meteorological data in grid format, covering the period from January 1, 1961, to July 31, 2020, with a spatial resolution of 0.1º × 0.1º.These data were generated from six distinct interpolation methods using meteorological and rainfall station data (Xavier et  2.3 Data pre-processing

Energy Data
For the study of the ANE and STE variables, raw and relative values from the available series were selected.For ANE, the raw value in MW mean (Gross ANE) and the percentage value relative to the longterm average (ANE (%)) were used.For STE, the value veri ed on the day in MW month (Gross STE) and the same value in percentage terms (STE (%)) were obtained, considering that the maximum storage capacity is reached when all reservoirs in the basin are full.
The daily ANE and STE series were transformed into monthly series for the period from 2000 to 2022.

Climate Data
The daily precipitation series from the BR-DWGD dataset were used.And the hourly data from ERA5-Land were aggregated to generate the daily precipitation series.Both datasets were clipped to the area of the São Francisco river basin for the period from 1990 to 2019, representing the most recent 30 common years for both datasets.

Precipitation Climate Extremes
In order to analyze the magnitudes and seasonality of rainfall in the basin, maps of annually and monthly average precipitation were generated across the study area.The Expert Team on Climate Change Detection and Indices (ETCCDI) of the World Meteorological Organization (WMO) has developed 27 indicators based on daily data of maximum and minimum temperature, as well as precipitation (Karl et al. 1999;Frich et al. 2002).In this study, the indicators listed in Table 1 were selected for the analysis of annual and/or monthly frequency of precipitation extremes.These indicators were calculated using the xclim library (Logan et al. 2023).
Days with daily precipitation equal to or greater than 1 mm were considered wet days, while dry days were de ned as those with daily precipitation lower than 1 mm.For the 11 indices, the annual averages of the corresponding precipitation extremes were calculated, allowing for the analysis of spatial distribution and trends.For each of the nine indices calculated on a monthly basis, four additional series were generated, consisting of the sum of extreme values from the previous 3, 6, 12, and 24 months.This provided monthly, quarterly, semi-annual, annual, and biennial accumulated values for each precipitation extreme index.

Trends
The Mann-Kendall test (Mann 1945;Kendall 1948) and Sen's slope estimation (Sen 1968;Theil 1992) were employed to examine the trends in the monthly series of energy variables and the annual series of extreme climate precipitation indices.The non-parametric Mann-Kendall test was used with a 95% con dence level.For conducting the Mann-Kendall and Sen's slope tests, the PyMannKendall library available in the Python language was used (Hussain and Mahmud 2019).

Correlations
The Pearson correlation coe cient (r) was employed to investigate the relationships between energy variables, as well as to identify the association between monthly series of ANE and STE with precipitation extremes indices at different time scales, including monthly, quarterly, semi-annually, annually, and biennially.The value of r can be interpreted as shown in Table 2.

Use of Arti cial Intelligence tool for generating predictive regression models
The potential prediction of energy variables based on precipitation extremes may be crucial to the understanding of the relationships between these variables.For this purpose, three machine learning regression methods were used, taking as input the data of precipitation climate extremes with Pearson correlation (r), identi ed as moderate or strong, in order to estimate the respective energy variables on a monthly basis.
The methods used are Random Forest (RF), which is an estimator that consists of a set of decision trees (Breiman 2001;Hastie et al. 2009); Arti cial Neural Networks (ANN), which is a method of data processing inspired by the transfer of data in biological neural systems (Lee et al. 2008); and k-Nearest Neighbors (kNN), which is a non-parametric method that estimates the relationship between inputs and outputs without any predetermined assumptions (Anaraki et al. 2021).
The appropriate selection of regression model parameters is crucial for obtaining accurate results.To determine the best parameter values for each regression method, the GridSearchCV algorithm was used.This algorithm performs an exhaustive search over a prede ned grid of hyperparameters, evaluating the model's performance for each parameter combination (Ranjan et al. 2019).

Cross-validation
Cross-validation is a statistical method used to evaluate the performance of machine learning models, in which the main goal is to assess the model's ability to generalize to unknown data (Chen et al. 2021).
This technique was applied by dividing the data into ve subsets, keeping one subset as the test set and the others as the training set.This process is repeated ve times, ensuring that each subset is used as the test set at least once.Evaluation metrics are calculated for each round, and the average is taken across the ve results.This procedure is repeated 30 times, totaling 150 tests.Results are presented in box plots, showing the average of the evaluation metrics obtained in the 30 repetitions.

Assessment Metrics
To assess the quality of predictions obtained by the regression models ve metrics have been used.The Mean Absolute Error (MAE) (Eq. 1) is calculated as the average of the absolute differences between the predicted values and the observed values.It provides a direct measure of the average error of the predictions.The Mean Absolute Percentage Error (MAPE) (Eq.2) calculates the average of the absolute percentage errors, providing a relative measure of the average error relative to the true values.
The Root Mean Squared Error (RMSE) (Eq. 3) is another widely used metric that calculates the square root of the average of the squared errors between the predictions and the true values.This metric penalizes larger errors more heavily and is sensitive to outliers.
The Kling-Gupta E ciency (KGE) coe cient (Eq.4), proposed by (Gupta et al. 2009) and modi ed by (Kling et al. 2012), is a measure of e ciency that compares the variability, trends, and correlation of predictions with the true values.This coe cient ranges from -∞ to 1, where values close to 1 indicate a good t of predictions to the observed data.
Lastly, the Willmott's concordance coe cient (d) (Eq.7) is used to evaluate the comparative accuracy of estimation values by calculating the discrepancy between the values of one estimation relative to another (Willmott et al. 1985).The values of this index range from 0 to 1, representing no agreement and perfect agreement, respectively. ( (5) In the equations shown above, n is the number of data points, µ is the mean value, CV is the coe cient of variation, P is the prediction value, and O is the true value.

Results and Discussion
This section may be divided by subheadings.It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.São Francisco meso-region, indicating that about 60% of the total precipitation, in the rainiest areas occurs in the wettest quarter.

Precipitation Climate Extreme Index
The indicators of wet days (WD) and consecutive wet days (CWD) show higher values in the upper São Francisco meso-region, the western part of the middle São Francisco, and the eastern area of the lower São Francisco.On average, there are more than 85 rainy days per year, with consecutive rainfall events lasting between 16 and 24 days in these regions.On the other hand, a low value of these indices is observed in the sub-middle São Francisco meso-region.It is noticed an average of 30 to 60 rainy days per year, with only 8 to 12 consecutive rainy days in most of this meso-region.
The indices RX5DAYS and R20mm reveal the occurrence of extreme precipitation events in the upper and middle São Francisco meso-regions, as they also have the highest values for these indicators.RX5DAYS measures the maximum intensity of precipitation over ve consecutive days and records values above 120mm in these same meso-regions.On the other hand, R20mm, which measures the frequency of days with precipitation exceeding 20mm, indicates the occurrence of more than 11 days with values above this threshold.
The variable RX1DAY (Maximum Precipitation in a Single Day) is distributed throughout the basin area, with values exceeding 50mm observed in all meso-regions, with values exceeding 60mm in the upper and middle São Francisco.On the other hand, the SDII indicator (Mean Daily Intensity) shows differences between the datasets, as for BR-DWGD, higher magnitudes (above 10 mm/day) are observed in the upper and middle São Francisco meso-regions, with values between 8 and 9 millimeters per day in the submiddle São Francisco meso-region.For the ERA5-Land, the highest SDII values also occur in the upper and middle São Francisco (between 8 and 9 mm/day), with minimum values in the other areas across the basin.
The prominent extreme indices in the lower São Francisco are consistent with a recent study conducted by (Morales et al. 2023).In this study, the authors observed that the eastern coast of the Northeast region experiences a higher number of extreme precipitation events compared to the semiarid region.Regarding the meso-region most affected by precipitation extremes, there is agreement with the study conducted by (Jeferson de Medeiros et al. 2022), in which the extreme precipitation events (PRCPTOT, RX1day, RX5day, SDII, R20mm, CWD) are more intense, frequent, and prolonged in the northern, southern, and southeastern regions of Brazil.During the rainy season, the South Atlantic Convergence Zone (SACZ) has a signi cant in uence on the rainfall regime in the southeastern region of Brazil (Marengo et al. 2015; Nielsen et al.

2019; Rosa et al. 2020
).The study conducted by (da Fonseca Aguiar and Cataldi 2021) demonstrates that the frequent occurrence of these extreme events is caused by the in uence of the SACZ.Their results reveal that the average conditional probability of the occurrence of the SACZ, when there are disasters due to intense or persistent rainfall events in the Southeast, is 48%, and in Minas Gerais, this probability is even higher, reaching 50%.Lastly, it is worth noting analyses of the indicators DD (Dry Days), CDD (Consecutive Dry Days), and PRCDQ (Driest Quartile Precipitation).The highest number of dry days (from 280 to 320 days) is observed throughout the sub-middle São Francisco meso-region and the eastern part of the middle São Francisco, which also records the maximum number of CDD (from 145 to 190 days).These results suggest that, on average, this area goes without precipitation for more than six months, resulting in high vulnerability to drought events.Predominantly in the lower São Francisco meso-region, precipitation in the driest quarter is higher compared to other meso-regions.These analyses corroborate the study by (Jeferson de Medeiros et al. 2022), which found lower intensity and frequency of extreme precipitation events in Northeast Brazil.In this region, there is a high number of Consecutive Dry Days (CDD) in the central portion of the Northeast, which has a semiarid climate.However, this does not mean that there are no occurrences of extreme rainfall in the Northeast, as the RX1DAY and SDII indices stand out.According to the conclusions of (Monteiro 2022), in the semiarid area of the Northeast, when atmospheric conditions result from the interaction of two or more meteorological systems, such as UTCV's, EWD, and ITCZ, it is frequent to result in signi cant rainfall characterizing true occurrences of extreme indices.
These results highlight the spatial heterogeneity of extreme precipitation events in the São Francisco river basin, which requires the adoption of speci c strategies for each hydrographic meso-region in order to promote e cient water resources management.It is also important to understand the seasonal behavior of the extreme indices (Fig. 3).
Analyses of the distribution of average PRCPTOT values (Fig. 3a) throughout the years reveal the presence of a rainy period between November and March, with monthly averages above 100mm which are represented by both datasets.On the other hand, a dry period occurs in the basin from May to September, with monthly total precipitation below 30mm.October and April are transitional months between the rainy and dry periods, with an average of approximately 60mm.
The extreme indices related to heavy rainfall (WD, CWD, SDII, RX1DAY, RX5DAYS, and R20mm) (Figs. 5a to 5g), show their maximum during the rainy period, between November and March.This indicates a strong in uence of these extreme events on the precipitation regime of the São Francisco river basin.The presence of these extreme events during the rainy season, highlights the in uence identi ed by (da Fonseca Aguiar and Cataldi 2021), as shown by a signi cant association between natural disasters and the SACZ, in the months of November, December and January.According to the authors, more than half of all SACZ-related events occurred during this period, with respect to others in the Southeast region.
On the other hand, the extreme indices related to drought reach their monthly maximum averages in the months between June and September.During this period, on average, 25 days of each month are characterized as dry (Fig. 3h), with at least 20 of these days being consecutively dry (Fig. 3i) indicating the dry period in the basin.

Trends in Precipitation Climate Extreme Indexes
Signi cant trends calculated for the annual extreme precipitation indices over the study period are shown in Fig. 4. It is worth noting that only the signi cant values are highlighted (p ≤ 0.05), thus, the entire white area within the basin does not exhibits trends or, if they do, they are not signi cant.
Trends of extreme precipitation indices differ in terms of spatial distribution and magnitude in the two analyzed datasets.This behavior is justi ed by (Regoto et al. 2021), which states that comparing different datasets and methods can lead to large variations and uncertainties in climate extreme trends.A negative trend is observed throughout the basin for the PRCPTOT, WD, CWD, PRCWQ, and R20mm indices, with larger magnitudes exhibited by the ERA5-Land dataset.It is important to mention that both datasets in most cases deliver similar spatial pattern but with different magnitudes.
These results are similar to those detected by (Avila-Diaz et al. 2020), that found predominantly negative trends in the São Francisco basin for the PRCPTOT, R20mm, and CWD indices, although only CWD showed a signi cant trend.Downward trends are observed in the sub-middle São Francisco meso-region and the northern portion of the middle São Francisco for the SDII, RX1DAY, and RX5DAYS indices, especially in the ERA5-Land dataset, diverging from the results obtained by BR-DWGD (Fig. 4a), which generally do not indicate signi cant trends in these areas, and show positive trends in the middle São Francisco for the mentioned indicators.
An increasing trend in DD index is observed throughout the entire area of the São Francisco river basin, with larger magnitudes found in the ERA5-Land, which also revealed positive trends in CDD in all hydrographic meso-region of the basin.On the other hand, the BR-DWGD dataset suggests a relatively milder increasing trend in CDD in the middle São Francisco.Observations of reduction (increase) in intense rainfall events (drought) in the semiarid region of the basin are consistent with the ndings by (Assis et al. 2022).They emphasize the decrease in total precipitation, daily rainfall, maximum volumes in 1 and 5 consecutive days, rainy days, and the number of days with moderate, heavy, and intense rainfall.Positive trends were only observed in consecutive days without rain, reinforcing the negative trend of rainfall and recurrent droughts in northeastern Brazil.
The PRCDQ index shows downward trends throughout the basin, with the greatest magnitude in the lower São Francisco.In the BR-DWGD dataset, these trends are characterized by minimal magnitudes.Climate analyses are essential to understand how climate extremes affect water resource availability for energy generation.Changes in extreme precipitation event trends can directly impact hydroelectric power production.Therefore, e cient energy production, water resource management and sustainable energy supply may bene t for a clear understand of short-scale weather events.

Correlations between Climate Extremes and Energy Variables
The Pearson linear correlation coe cients (r) between energy variables and accumulated climate extremes are presented in Table 3. Empty elds in the table indicate that the corresponding correlations did not reach the desired signi cance level (p ≤ 0.05).Both datasets used showed r values > |0.70| between Gross A uent Natural Energy (ANE) and all accumulated extreme indices for six months, except for CDD 6, where the ERA5-Land dataset indicates a moderate correlation.In fact, 1, 3, 6, 12 and 24 indicate that the indices are summed throughout these speci c months interval.Moderate correlations are also observed with indices accumulated over three months.As for ANE (%), strong correlations are observed with CWD 24 from ERA5-Land and R20mm 24 from BR-DWGD, with consensus only for PRCPTOT 24, while correlations are moderate for the other indices accumulated over 24 months, as well as for all indices accumulated over 12 months.
The correlations between Gross Stored Energy (STE) (both gross and percentage) and extreme indices exhibit strong values, particularly highlighting the correlations with RX1DAY 24 and RX5DAYS 24 from the BR-DWGD dataset, as well as with PRCPTOT 24, WD 24, CWD 24, R20mm 24, and DD 24, with agreement between the two data sources.For the other extremes accumulated over 24 months and all accumulated over 12 months, the correlations with stored energy are moderate.Based on results presented in Table 3, the correlations indicate that energy variables can be affected by climate extreme indices related to precipitation at various time scales.These previous analyses demonstrated that a gain can be obtained in the interpretation of those correlation.Such as, the more the extremes (PRCPTOT, RX1DAY, RX5DAYS) act in the basin, greater the water availability.On the contrary, the smaller the performance of the extremes, the more thermal plants will be activated in the system.

A uent Natural Energy (ANE)
In Fig. 5, the historical time series and monthly averages of Gross ANE and Percentage of ANE are presented from January 2000 to December 2022.Both time series curves (Figs.7a and 7b) show a cyclic oscillation, which can be explained by the monthly averages of the historical series (Figs.7c and 7d) and their relationships with extreme climate precipitation indices.
The in uence of extremes in the São Francisco river basin is observed through the short-term response of Gross A uent Natural Energy, with stronger correlations occurring in periods of three to six months (Table 3).This relationship reveals a well-de ned seasonal behavior, with minimum values recorded in September, the last month of the driest quarter, at approximately 2000 MW.mean.The values of Gross ANE increase in the following months and peak in February, the fourth month of the rainy season, at over 10000 MW.mean, before decreasing in the subsequent months.
This can be explained by the impact of extremes in the basin, which cause changes in the natural ow of the main river channel in short periods.On the other hand, percentage of ANE, which incorporates longterm averages in its determination, shows long-term responses, ranging from 12 to 24 months.
Seasonality was also observed by (Vilar et al. 2020), according to the authors, the time series shows a decrease in observed ANE values due to the dry season.Percentage of ANE also exhibits seasonality, but with changes throughout the year.The lowest monthly average value, around 62%, is recorded in October, a transitional month between the dry and rainy periods in the basin.From there, the values gradually increase to reach a maximum of approximately 78% in January, the third month of the rainy season.Subsequently, the values decrease, with increases observed in April and June compared to the previous month.
These increases may be related to the extreme R20mm, which is well correlated with the percentage of ANE, when it is accumulated over 24 months (Table 3).In May, on average, up to two events of intense precipitation can occur in the basin (Fig. 3g), in uencing the natural river ow in the subsequent month.
The slight increase observed in June is likely associated with the total precipitation in the lower São Francisco hydrographic meso-region and the eastern extreme of the sub-middle São Francisco from May to August (Fig. 3).Despite being predominantly dry months in other meso-region of the basin, the Paulo Afonso hydroelectric complex and the Xingó HPP are located in these regions (Fig. 1), which use run-ofthe-river reservoirs for energy generation, contributing signi cantly to ANE (%) in the basin.
It is also worth noting that ANE (%) is based on a long-term mean (LTA) that has been veri ed since 1931 and, therefore, may contain climatological patterns not identi ed in the temporal scope of this study.

Stored Energy (STE)
In Fig. 6, the historical time series of Gross Energy Generation (STE) and Percentage of STE, as well as their monthly averages are presented for the January 2000 to December 2022 interval.
It is notable that both time series (Figs.8a and 8b) and the monthly average curves (Figs.8c and 8d) exhibit identical behavior.This is because, throughout the analyzed period, the maximum STE of the São Francisco river basin remained steady at 52,727 MW.month, according to information from [20].The cyclic oscillation of time series can be explained based on their respective monthly averages and their association with extreme precipitation climate indices.
The response of STE (gross and percentage) to extremes shows a long-term relationship with precipitation extremes, as stronger correlations occurred in 24-month periods (Table 3).The Gross STE and Percentage of STE (Figs. 8c and 8d, respectively) patterns exhibit a clear seasonal cycle.The minimum value is observed in November, at the beginning of the rainy season, at approximately 17,500 MW.month, corresponding to 35% of the maximum energy production capacity of reservoir-based power plant.
Over the following months, this value gradually increases, reaching its maximum in May, during the dry period of the basin, at around 35,000 MW.month, representing approximately 66% of the percentage of STE.This behavior can be attributed to the control of water levels in the reservoirs, which have a considerable storage volume.This provides the system with resilience to variations in water availability over short periods of time.
Analyses of the average monthly values of ANE (Fig. 5) and STE (Fig. 6) reveal a signi cant difference in magnitude between the two quantities.This difference is mainly due to the fact that, in addition to differences in working volumes, HPP with ow control reservoirs have the ability to store water and therefore generate electrical energy during periods of high demand (STE), while run-of-the-river HPP have their electricity production limited by the available river ow (ANE).

Trends of Energy Variables
Results obtained based on the Mann-Kendall trend test for the energy variables are presented in Table 4. presented in this section provide a comparative analysis of evaluation metrics for the predictive regression models applied to two datasets: BR-DWGD, represented in blue, and ERA5-Land, represented in green.By analyzing Fig. 7a, the predictive models for Gross ANE yielded close and satisfactory results for both datasets.The KNN model demonstrated the best performance in all metrics for the ERA5-Land.For the BR-DWGD dataset, the KNN model also delivers the best performance in terms of KGE, d, and MAPE, but with relatively large dispersion.
Regarding the predictions of ANE (%) (Fig. 7b), it can be observed that the results were similar and acceptable for all three models in both datasets.The RF model showed the best performance with the BR-DWGD data in all metrics.On the other hand, for the ERA5-Land dataset, the performance of the RF model was comparable to that of the KNN model.
The estimations of stored energy in both gross form (Fig. 8a) and percentage form (Fig. 8b), also exhibited satisfactory results for the BR-DWGD and ERA5-Land datasets.Understanding the relationships between extreme climate events and energy variables, especially the ability to e ciently estimate ANE and STE, based on information on extreme precipitation events, presents signi cant potential for future applications in energy production planning in Brazil.
These ndings are particularly relevant when considering extreme events predicted for the future, such as prolonged droughts or heavy rainfall, which can have substantial impacts on energy supply.The use of such information can contribute to more resilient and adaptable energy planning, that takes into account challenges arising from extreme climate events.
Although the evaluated models in this study have demonstrated e ciency in estimating STE and ANE based on information on extreme precipitation, it is important to note that the inclusion of other climate variables can further enhance the accuracy of predictive models.In addition to precipitation, factors such as temperature and regional climate patterns play crucial roles in energy production.Therefore, future research exploring the integration of other climate variables into the models, can improve the ability to accurately predict energy availability in situations of extreme climate events.

Conclusions
Analysis of extreme precipitation events in the São Francisco river basin reveals distinct patterns in different regions.The Upper São Francisco meso-region and the western portion of the middle São Francisco are the most impacted by extreme rainfall events, which is related to the rainy season in the basin.
On the other hand, the sub-middle São Francisco and the eastern region of the middle São Francisco are more susceptible to extreme drought events, recording the highest magnitudes throughout the year.Trend tests indicate a reduction in total precipitation and the number of rainy days, as well as an increase in the number of dry days.Seasonal analysis of Energy Stored (STE) and A uent Natural Energy (ANE) reveals a behavior in uenced by precipitation extremes on different time scales.The short-term responses of Gross ANE levels and the long-term responses of ANE (%) and STE (both gross and percentage), are evident in response to extreme climatic events.
The regression models based on arti cial intelligence methods demonstrated e cient performance in estimating ANE and STE based on extreme precipitation events in the basin.These results pave the way for further research and practical applications, including forecasts of future energy conditions based on climate extremes, and the consideration of additional weather variables to enhance the e ciency of the models.
These ndings are relevant to the planning and management of the electricity sector, especially in relation to strategic decision-making and the development of public policies aimed at ensuring the energy security of the country.

Figure 2
Figure 2 presents the results of two datasets, BR-DWGD and ERA5-Land (Figs. 4a and 4b, respectively), which are consistent regarding the spatial distribution of annual mean precipitation extremes in the São Francisco river basin.Among the analyzed variables, the indicators PRCPTOT (Total Precipitation) and PRCWQ (Wettest Quartile Precipitation) stand out.They have the highest values in the upper and middle

Figure 2 Annual
Figure 2

Figure 3 Monthly
Figure 3

Figure 5 Monthly
Figure 5

Figure 6 Monthly
Figure 6 al. 2016) and have been widely utilized in climate studies (de Andrade et al. 2022;

Table 1
Precipitation climate extreme index.

Table 2
Interpretation of Pearson's correlation coe cient value (r).

Table 4
Non-parametric tests for the energy variables..3.Use of Arti cial Intelligence tool for generating predictive regression models In the following Arti cial Intelligence tools are used for generating predictive regression models.Results Similarly, (Da Silva et al. 2021) observed negative trends when analyzing time series of natural in ows into the Itaparica and Sobradinho reservoirs.The authors attributed these trends to the occurrence of droughts recorded in recent years in the São Francisco river basin.3 It can be observed that, except for ANN in STE (%), the dispersion of predicted values is relatively low compared to the ANE estimations.The RF model delivers the best performance for STE predictions in all metrics for both datasets.It is noteworthy that, for both gross STE and STE in percentage, the values of d and KGE are close to optimal values, with a low MAPE ranging between 25% and 30%.The results presented in this section indicate that extreme precipitation events are directly related to both the STE in reservoirs and the ANE in the São Francisco river Basin.Furthermore, it is observed that it is feasible to e ciently estimate STE and ANE using only information on extreme precipitation indices in