3.1 Accumulated rainfall (R)
The accumulated rainfall (R) is defined as the sum of precipitation over a given period and a given area. R⁷ is presented here as the accumulated rainfall over 7 days, as this term presented the maximum correlation with the discharge data in the correlation analysis, as explained above. Henceforth, R⁷ corresponds to the accumulated rainfall fixed on time (RFT). Fig. 3 shows the time series of discharge and R⁷. This quantity presents a correlation of 0.735 and p value < 0.01 with the discharge data estimated for the Valo Grande Channel using the method described in Section 2.2 (Fig. S1, available in the Electronic Supplementary Material).
The river discharge peaks coincided with the rainfall peaks, indicating the influence of rainfall in modulating discharge (Fig. 3). For instance, intense rainfall occurred at the beginning of 2011, reaching rates over 10000 m³.s⁻¹ before decreasing progressively until June. This same pattern was observed in the VGC discharge, with a peak flux of almost 1200 m³.s⁻¹ at the beginning of 2011 and a decreasing flux until the end of June. Interestingly, all rainfall peaks approximately coincided with the river discharge peaks, showing the coherence between the two datasets.
3.2 Regression models
In this section, we present the model results. We focus on the MLR models, as they were considerably more reliable than the linear, quadratic, and exponential models. Details about the statistical differences among these models are presented in Section 3.3, and the scientific reasons and explanation of these differences are presented in Section 4. The detailed results of the linear, quadratic, and exponential models are available in the Electronic Supplementary Material (Figs. S2 and S3).
3.2.1 Multiple linear regression models
We present two different MLR models. As we found the highest correlation between discharge and accumulated rainfall in 7 days (R⁷), in the first multiple regression model, we used MERGE R⁷. In the second one, we used the accumulated rainfall varying in time (RVT), depending on the highest correlation for each grid cell of MERGE.
The multiple regression model using R⁷ resulted in minimum, maximum and mean standard errors of 0.003, 0.064 and 0.009, respectively. In 99% of the MERGE grid cells, the standard errors were lower than 0.03, except in the three grid cells in the southern region of the catchment (Fig. S4-a). In general, the errors associated with the regression results were low in almost the entire domain. The p values were lower than 0.05 for 27% of the grid points (69 grid points of 260 in total) (Fig. S4 -b).
The black circles in Fig. 4 represent each value included in the test subset validation of the multiple regression models using R⁷ (Fig. 4-a) and RVT (Fig. 4-b). The multiple regression model using R⁷ presented a skill of 0.91, R² score equal to 0.71 and RMSE of 107.05 m³.s⁻¹ (Fig. 4-a). The multiple regression model with RVT presented a skill of 0.91, CD of 0.70 and RMSE of 104.34 m³.s⁻¹ (Fig. 4-b). The validation of the models showed that both R⁷ and RVT were able to reproduce the discharge data considerably well for both low and high values (Fig. 4-a and b, respectively).
The time-series validation of the multiple regression model using R⁷ showed that the model represented the discharge reasonably well (Fig. 5), presenting a skill of 0.92, CD of 0.64 and RMSE of 108.68 m³.s⁻¹. The modeled time series reflected the main discharge patterns, including the peaks observed in January and June, as well as the lowest values present in the observations. Notably, the seasonal patterns were also present in the reconstructed time series (Fig. 5).
The best model results were produced by the RVT-considering model, using the highest correlation coefficient between each rainfall grid cell of MERGE and the discharge time series. The maximum correlation coefficients (Fig. 6-a) were found between 6 and 9 days of accumulated rainfall (called R⁶ and R⁹, respectively) (Fig. 6-b).
The average standard error of RVT was 0.009. For all MERGE grid cells, the standard error was lower than 0.03, with the exception of two grid cells in the southern region of the watershed, showing that the errors associated with the regression were low (Fig. S5-a). The p values were lower than 0.05 for 30% of the grid points, or 79 grid points (Fig. S5-b).
For the time-series validation, we found a skill of 0.92, CD of 0.67 and RMSE of 106.35 m³.s⁻¹ (Fig. 7). The modeled time series reflected both the main patterns of and seasonal variability in the data (Fig. 7), similar to the previously tested multiple regression model (Fig. 5).
The results obtained for both multiple regression models were similar. The modeled VGC discharge was better represented in the multiple regression estimates (Fig. 4) than in the linear, quadratic and exponential simulations (Figs. S2 and S3). All models underestimated discharge values higher than 1000 m³.s⁻¹, with the best results provided by the MLR models.
3.3 Model comparison
In this section, we evaluate all model results using the test subset and time series validations. Table 1 shows the skill, coefficient of determination (CD) and RMSE values derived in the test subset validation. Table 2 shows the results of the time-series validation for all models, including the skill, coefficient of determination, RMSE, Pearson’s correlation coefficient and the Nash–Sutcliffe efficiency (NSE) values.
Table 1 Skill, coefficient of determination (CD), and RMSE (m³.s⁻¹) values derived in the test subset validations of the linear, quadratic and exponential models.
Model
|
Skill
|
CD
|
RMSE (m³.s⁻¹)
|
Linear
|
0.83
|
0.51
|
125.26
|
Exponential
|
0.81
|
0.52
|
124.26
|
Quadratic
|
0.82
|
0.53
|
122.66
|
Multiple linear regression (R7)
|
0.91
|
0.71
|
107.05
|
Multiple linear regression (RVT)
|
0.91
|
0.70
|
104.34
|
Table 2 Results of the time-series validation. The skill score, coefficient of determination (CD), RMSE (m³.s⁻¹) and Pearson’s correlation coefficient values between each model result and the discharge time series estimated for the Valo Grande using the method proposed by GEOBRÁS (1966) from 2011 to 2019 are shown.
Model
|
Skill
|
CD
|
RMSE (m³.s⁻¹)
|
Pearson’s correlation and p value
|
Nash–Sutcliffe efficiency (NSE)
index
|
Linear
|
0.84
|
0.54
|
141.61
|
0.73 - <0.01
|
0.54
|
Exponential
|
0.84
|
0.57
|
136.38
|
0.77 - <0.01
|
0.57
|
Quadratic
|
0.83
|
0.56
|
137.63
|
0.76 - <0.01
|
0.56
|
Multiple linear regression (R⁷)
|
0.92
|
0.64
|
108.68
|
0.85 - <0.01
|
0.74
|
Multiple linear regression (RVT)
|
0.92
|
0.67
|
106.35
|
0.86 - <0.01
|
0.75
|
In the test subset validation, the skill values were above 0.8 in all cases, with the highest values corresponding to 0.91 for the MLR models (Table 1). The CDs of the linear, quadratic and exponential models showed values of approximately 0.5, and the RMSE values were between approximately 122 m³.s⁻¹ and 125 m³.s⁻¹ (Table 1). For the test subset validation of the MLR models, the CD values were equal to 0.71 and 0.70, and the RMSE values were equal to 107.05 m³.s⁻¹ and 104.34 m³.s⁻¹ for the R⁷ and RVT models, respectively (Table 1).
From the time-series validation results, we found a skill of approximately 0.83, CDs between 0.54 and 0.57, RMSEs from 136 m³.s⁻¹ to 142 m³.s⁻¹, and Pearson’s correlation coefficients between 0.73 and 0.77 for the linear, quadratic and exponential models (Table 2). For both MLR models, we found a skill of 0.92 for the time-series validation (Table). In this validation, the CD values were equal to 0.64 and 0.67, the RMSE values were 108.68 m³.s⁻¹ and 106.35 m³.s⁻¹, and the Pearson’s correlation coefficients were 0.85 and 0.86 for the MLR models using R⁷ and RVT, respectively (Table 2). The Pearson’s correlation coefficients of the 5 models presented p values lower than 0.01.
The NSE index presented an improvement of up to 28% for the multiple regression models when compared to the linear, quadratic and exponential models. The highest NSE index was equal to 0.75, which was found for the multiple regression model using RVT (Table 2). The multiple regression model using R⁷ presented an NSE index of 0.74, while the linear, quadratic, and exponential models presented NSE indices equal to 0.54, 0.56, and 0.57, respectively (Table 2).
Comparing the QQplot derived for each model, the data quantiles were better reproduced by the MLR models considering R⁷ and RVT (Fig. 8-d and e, respectively) than by the linear, quadratic and exponential models (Fig. 8 a, b, and c, respectively). The linear, quadratic and exponential models were able to effectively represent VGC discharge values between 250 and 600 m³.s⁻¹, with the model results closely coinciding with the perfect-fit lines (in red) in the QQ plots (Fig. 8 (a, b, and c)). Increased river discharge (>600 m³.s⁻¹) is associated with a degradation of the results of these models, as the models tended to underestimate peak values, presenting RMSEs varying from 136 m³.s⁻¹ to 141 m³.s⁻¹ at discharge values above 600 m³.s⁻¹. This limitation was more evident in the comparison between the modeled and observed VGC discharge time series (Figs. S3) but was reduced in the multiple regression models (Fig. 8 (d and e)), which exhibited better agreement between the model results and data during high-flow events above 600 m³.s⁻¹, with RMSE values of 106.30 m³.s⁻¹ and 103.70 m³.s⁻¹ for the MLR models considering R⁷ and RVT, respectively. These results indicate improvements between 22% and 27% for the MLR models compared to the previous ones. The multiple regression models fit the data well at discharge values between 240 m³.s⁻¹ and 900 m³.s⁻¹ (Fig. 8-d and -e). In these cases, values lower than 240 m³.s⁻¹ and above 900 m³.s⁻¹ were overestimated and underestimated, respectively, and generally increased with the discharge amount. Nevertheless, these underestimations detected in the multiple regression models were lower than those in the linear, quadratic and exponential models, showing the clear improvement of the multiple regression models compared to the other cases. A specific period of the VGC discharge time series containing high and low peaks predicted by the models (Fig. S8-a and S8-b, respectively) indicated that the multiple regression model using RVT better reproduced the discharge data than the other models. Both MLR models presented satisfactory results and could thus be used to predict good VGC discharge estimates.
3.4 Reconstruction of complete time series and seasonal comparisons (2000-2020)
Based on the results of the previous sections, the MLR model considering RVT presented the most accurate predictions. Thus, we applied this model to predict a complete time series of VGC discharges (Fig. S6). Since the MERGE data start in June 2000, the time series spans from this month until December 2020 (Fig. S6).
The seasonal precipitation patterns in the Ribeira de Iguape watershed and this reconstructed VGC discharge series were evaluated by the monthly averages considering a 95% confidence level (Fig. 9). The monthly mean rainfall from the MERGE dataset presented the highest values from October to March (Fig. 9-a), varying from 130 mm (in November), with a confidence interval from 111 mm to 149 mm, to 241 mm (in January), with a confidence interval from approximately 74 mm to 133 mm. The highest mean rainfall and discharge values occurred in January (241 mm and 582 m³.s⁻¹, respectively), and the lowest values occurred in August (66 mm and 325 m³.s⁻¹, respectively), considering the period from 2000 to 2020 (Fig. 9). From April to July, the rainfall and VGC discharge values presented relatively small variations, oscillating between 88 and 93 mm and between 342 and 368 m³.s⁻¹, respectively (Fig. 9). These values began increasing in September, with monthly mean values of 100 mm and 380 m³.s⁻¹, within confidence intervals of approximately 74 mm to 133 mm and approximately 370 to 390 m³.s⁻¹, respectively for rainfall and VGC discharge, respectively, achieving their maximum peaks in January.
Basically, the seasonal variability present in rainfall modulated its variability in the VGC discharge results. Both the rainfall and VGC discharge series presented seasonal variabilities, with the highest values found in summer (December, January, and February) and in March, with values greater than 150 mm and 470 m³.s⁻¹, respectively. The lowest values were found in winter (June, July, and August), at 89 mm and 365 m³.s⁻¹, with the lowest values found in August, at 66 mm and 325 m³.s⁻¹ for rainfall and discharge, respectively (Fig. 3 and Fig. 9).
The highest variabilities were observed in September, January and July, with confidence intervals ranging from 74 mm to 133 mm, from 211 mm to 270 mm, and from 61 mm to 117 mm, respectively (Fig. 9 -a). For the monthly mean VGC discharge values (Fig. 9 -b), the confidence intervals ranged from 111 m³.s⁻¹ to 149 m³.s⁻¹ in October and from 560 m³.s⁻¹ to 603 m³.s⁻¹ in January. From March to November, the monthly mean VGC discharge values presented lower variabilities than those in December, January and February, showing confidence interval range values in October from 383 m³.s⁻¹ and 405 m³.s⁻¹ and in June from 350 m³.s⁻¹ to 380 m³.s⁻¹. During December, January and February, the confidence intervals ranged from 420 m³.s⁻¹ to 453 m³.s⁻¹, 560 m³.s⁻¹ to 603 m³.s⁻¹, and from 481 m³.s⁻¹ to 520 m³.s⁻¹, respectively (Fig. 9 -b).
The climatology of the VGC discharge was computed using the daily discharge estimates obtained with the multiple regression model considering RVT. These results were compared to the monthly mean VGC discharge data obtained from the daily discharge data available from 2011 and 2019 (Fig. 10). For this period, when comparing the monthly mean VGC discharges between the multiple regression model considering RVT and the data, we found a skill of 0.96, RMSE of 51.61 m³.s⁻¹ and Pearson’s correlation coefficient of 0.95 (with a p value smaller than 0.01). Therefore, the VGC discharge monthly means were well correlated with the monthly means obtained with the multiple regression model considering RVT (Fig. 10), allowing us to extend the climatological period to the period in which MERGE data are available. The highest monthly mean values were found during January in both data and modelled series, at 570 m³.s⁻¹ and 530 m³.s⁻¹, respectively (Fig. 10). The lowest values were detected during September and reached 280 m³.s⁻¹ and 300 m³.s⁻¹ in the data and model series, respectively (Fig. 10). The best fit between the data and modelled monthly means occurred in November, followed by in June and August (Fig. 10). From January to April, and also in July, the model predictions showed smaller values than the data, while during May, August, September, October and December, the model overestimated the discharge values compared to the VGC data (Fig. 10). The standard deviation values were similar between the data and multiple regression model considering RVT (Fig. 10). The highest standard deviations were found in January (from approximately 250 m³.s⁻¹ to 850 m³.s⁻¹) and June (from approximately 180 m³.s⁻¹ to 750 m³.s⁻¹), followed by in February and August, which presented values between 260 m³.s⁻¹ and 720 m³.s⁻¹ and between 90 m³.s⁻¹ to 550 m³.s⁻¹, respectively (Fig. 10).
High anomalous values were detected from 2011 to 2019 in June (Fig. 10). Curiously, during this period, June presented higher monthly mean values (approximately 450 m³.s⁻¹) than May and July (approximately 340 m³.s⁻¹ and 320 m³.s⁻¹, respectively) (Fig. 10). In addition, this pattern contrasted the observed monthly mean June discharge from 2000 to 2010 and in 2020 (Fig. 11). This was the result of the high daily variability (Fig. S7) and the presence of discharge peaks in the VGC discharge during June in 6 of the 9 years from 2011 to 2019 (Fig. S7). These anomalous discharge values in June were caused by the high precipitation in the same period, which followed a consistent pattern starting in 2012 and followed by 2013, 2014, 2016, 2017, and 2019 (Fig. S7). These June anomalies were also found by Marta-Almeida et al. (2021) but were not investigated further in this work, as the topic was out of the scope of this study.
In addition, we presented the monthly mean values from 2000 to 2020 in the predicted time series using daily estimates from the multiple regression model considering RVT and from 2011 to 2019 compared to the observed VGC discharges from the DAEE (Fig. 11). The VGC discharge values estimated from the DAEE data from 2011 to 2019 (orange bars in Fig. 11 (l to t)) were effectively represented by the multiple regression model considering RVT (blue bars in Fig. 11 (l to t)). Twelve of 20 years presented the highest monthly mean VGC-modeled discharge values in January, varying from approximately 340 m³.s⁻¹ to 850 m³.s⁻¹ (Fig. 11). The highest monthly VGC discharge values occurred during summer (considering January, February and March), detected in 2001, 2006, 2007, 2011, 2016, 2017, and 2018 (Fig. 11 b, g, h, l, q, r, and s, respectively). In December, high VGC discharge values were also noticed in 2001, 2007, 2008, 2010, 2015, and 2020, varying from 400 m³.s⁻¹ to 850 m³.s⁻¹ (Fig. 11 b, h, i, k, p, and u, respectively). The lowest values were detected during winter (June, July and August) only in 2002, 2006, and 2010 (Fig. 11 c, g, and k, respectively).
High standard deviation values varying from approximately 300 to 700 m³.s⁻¹ were detected in the climatology for June, July and August (winter) (Fig. 10). These variabilities were detected due to highly anomalous VGC discharge values mainly in June. This was evident in 2012, 2013, 2014, 2016, 2017, and 2019 (Fig. 11 m, n, o, q, r and t, respectively); in some cases, discharge values similar or even superior to the summer-month values were achieved. Such a finding clearly occurred in June of 2012 (Fig. 11 (m)) and 2019 (Fig. 11 (t)), when the VGC discharge values reached 750 m³.s⁻¹ and 610 m³.s⁻¹, respectively. In June, we also detected high standard deviations with average values for this month, at 295 m³.s⁻¹ and 190 m³.s⁻¹ for the model results and data, respectively. Anomalous VGC discharge values were also detected in August 2011, reaching 700 m³.s⁻¹ and 600 m³.s⁻¹ for the data and model results, respectively (Fig. 11 (l)). The highest standard deviations were found in December, January and February, with average values reaching 297 m³.s⁻¹ and 274 m³.s⁻¹ (in January) for the model results and data (Fig. 11), respectively.