Hybrid (dynamical-statistical) models to improve seasonal drought 1 prediction: application to regional meteorological droughts in China

7 Accurate drought prediction is important for drought resistance and water resources management. H owe ver, the seasonal 8 drought prediction is of low accuracy for both dynamical and statistical models. In this study, we combined dynamical models 9 and machine learning to construct hybrid (dynamical-statistical) models. We used the random forest approach to identify 10 representative regions based on geopotential height, sea-level pressure, and 2-m temperature. The least absolute shrinkage and 11 selection operator (Lasso and an artificial neural network (ANN) were used to construct the statistical models, with atmospheric 12 variables as predictors and 3-month Standardized Precipitation Index (SPI3) as the predictand. The atmospheric variables 13 forecasted by the European Centre for Medium-Range Weather Forecasts (ECMWF) SEAS5 model were processed as 14 predictors to force the statistical models. The resulting hybrid models, constructed using dynamical models and machine 15 learning, were named as dynamic-Lasso (‘D-Lasso’) and dynamic-ANN (‘D-ANN’) separately. The results suggested that 16 prediction skills were improved by the hybrid models; compared to the best available dynamical model (UK Met Office), D- 17 ANN extends the forecast horizons by 6, 21, and 4 days in northern, eastern, and southern China, respectively. In spring and 18 summer, the correlation skills were also improved. The effective prediction of the atmospheric anomalies over the eastern and 19 southern Tibetan Plateau and the Northwest Pacific region was identified as the main contributor to successful seasonal drought 20 prediction. Overall, the hybrid models were able to predict drought processes effectively, and D-ANN outperformed the D- 21 Lasso in drought onset and persistence phases.


Daily-updated Standardized Precipitation Index 112
As a seasonal drought index, the SPI3 was calculated for the three study regions using area-averaged precipitation data. 113 Traditionally, the SPI3 varies on a monthly scale, reflecting cumulative precipitation over the past three months; however, a 114 monthly timescale does not well reflect the evolution of drought onset, persistence, and relief, presenting limitations for 115 meeting the requirements for real-time monitoring and prediction for effective management. Therefore, we followed the World and 90 are the 90-day average and standard deviation for the period from 3 January 1981 to 1 April 2010. The ERA5 reanalysis dataset for the 40-year period between 1980 and 2019 was processed into SA90, from which we obtained the SA90 for all of 129 the considered atmospheric variables (i.e. SA90GH200, SA90GH500, SA90GH850, SA90T2M, and SA90SLP). 130

Construction of ML models 132
We constructed ML models for each of the three drought regions using 32 years of data from 1980 to 2011 for training and 8 133 years of data from 2012 to 2019 for validation. The models adopted the daily five-layer SA90 dataset as the predictor and the 134 SPI3 as the predictand. The model structure is shown in Fig. 2. First, we flattened the five-layer SA90 into a column of 5*51*61 135 data points (the input neurons), extracted typical data through the RF, and then produced the output SPI3 data by forcing the 136 Lasso and ANN. The ML models (RF, Lasso, and ANN) are described in detail in Supplement 1-3. 137

Calculation of the prospective Nth-day SPI3 142
The ML models constructed in Sect. 3.3.1 incorporate the contemporaneous statistical relationships between the predictors and 7 for the prospective N days, and then forced the ML models to predict the Nth-day SPI3, as shown in Fig. 3. Therefore, the 146 drought prediction models we construct were hybrid (dynamical-statistical) models, which we name dynamic-Lasso ('D-147 Lasso') and dynamic-ANN ('D-ANN'), respectively. 148 149 Fig. 3 Schematic representation of the calculation process for the prospective Nth-day SPI3 150

GCM hindcast precipitation 151
The SPI3 calculated from the ECMWF, UKMO, and Météo France hindcast precipitation datasets were compared with the 152 hybrid models. For prediction, we use observed precipitation for the past (90-1-N) days, GCM outputs for the prospective N 153 days, and then calculated the Nth-day SPI3. 154

Evaluation metrics 155
Root-mean-square error (RMSE) was adopted as the loss function to describe the error between the model output and the 156 observed data, and for the inverse calculation of the network residuals, as follows: 157 (2) 158 where represents the observed value, represents the predictand, and n represents the sample size. 159 The correlation coefficient (Corr) was used as the performance evaluation function for ML to measure the correlation between 160 the model output and the expected values: 161 where is the observed value, is the predictand, ̅ = 1 ∑ =1 (the sample mean), and analogously for ̅; and n represents 163 the sample size. 164 4 Results 165

Identification of droughts 166
A drought event is defined as a number of consecutive days (> 60 days) with daily updated SPI3 values of < -0.5. Based on 167 this approach, the droughts identified between 1979 and 2019 in the study regions are shown in Table 2. 168 Table 2  between September and December. To show the differences between the two models, the skills based on D-ANN were 204 subtracted from the UKMO outputs (Fig. 6(e-i)). Compared to the UKMO outputs, D-ANN performed poorly from October 205 to January but was more skillful across the spring and summer months. For lead times exceeding 30 days, D-ANN has a clear 206 advantage. However, the correlation skills varied considerably among the regions, with higher skills in the east and south 207 regions compared to the north. 208  Fig. 7(a-c) shows that D-ANN can effectively predict seasonal droughts across the different regions. The color maps (Fig.  215 7(d-e)) quantify the contribution of each grid predictor to the prediction, where positive and negative indicate that the predictor 216 contributes to positive and negative to SPI3, respectively. 217 The representative regions identified by the RF for GH850 and SLP are mainly located in the east and south of the Tibetan 218 Plateau and the Northwest Pacific. In addition, the representative regions in the north are sparser than in the east and south, 219 which means the RF could not effectively extract key information from the atmospheric variables. Importantly, the East Asian 220 If hidden layer nodes are too few, it is difficult for a network to learn and acquire information-processing capabilities. 252 Conversely, too many nodes in a hidden layer result in over-fitting (Zou et al. 2009). We trained the model by modifying the 253 hidden nodes in the network while maintaining the number of layers at 1, the results of which are shown in Table 3. We found 254 that when fewer than 16 nodes are used, model performance was compromised, and when more than 16 nodes are included, 255 the model provides greater complexity but does not out-perform the validation dataset. 256 Table 3  The neural network learns from the data through the nodes in each layer, and the stacking of multiple layers allows the structure 259 of complex applications to be learned (Pan 2019). However, it remains questionable a deeper network (i.e. with more layers) 260 produced a stronger model (Tan 2020). To test this, keeping the number of nodes at 16, we varied the number of layers, as 261 shown in Table 4. Based on this, we found that network performance did not improve as the number of layers was increased. 262 From the analyses in Sec. 5.1.1 and Sec. 5.1.2, we found that the relationship between contemporaneous atmospheric 263 circulation parameters and regional meteorological drought may not be overly complex, meaning that a single-layer, 16-node 264 ANN model was able to effectively represent nonlinear relationships. 265

Anomaly correlation coefficients (ACC) of forecasted atmospheric variables 268
We used the forecasted atmospheric circulation dataset from the ECMWF as a predictor to force the hybrid models. Therefore, 269 the models contained two kinds of error, first from the ML models and, second, from the GCM. This section focuses on the 270 impact of the second error source. Fig. 9 shows that the ACC of all variables decreased with increasing lead time, which 271 indicates that the error from the ECMWF dataset introduced a lot of uncertainty to the prediction. Indeed, forecasting skills 272 are almost entirely lost for lead times above 50 days. 273

Perfect drought prediction 277
We replaced all of the atmospheric data for the prospective 90 days with reanalysis data to re-force the hybrid models, which 278 we named 'Perfect-D-Lasso' and 'Perfect-D-ANN' (Fig. 10). The correlation skills of the Perfect hybrid models were almost always above 0.8. Moreover, the performance of the models differed between the study regions, with the Perfect-D-ANN 280 model performing best in the north region. Methods for post-processing GCM precipitation outputs have also been included 281 in the dynamical-statistical prediction methods (Schepen et al. 2016), and so it is necessary to correct the forecasted 282 atmospheric variables accordingly. Seasonal drought prediction is important for drought resistance and water resources management. Here, we constructed 290 seasonal drought prediction models with atmospheric variables (GH200, GH500, GH850, T2M, and SLP) as predictors and 291 the SPI3 as the predictand. The resulting models were applied and evaluated for the prediction of meteorological droughts in 292 north, east, and south regions of China. The main conclusions can be summarized as follows: (1) the daily-updated SPI3 was 293 used to identify droughts for the last 40 years in the three drought regions; (2) the five-layer atmospheric variables were 1980-2011 and validated for the period 2012-2019, demonstrating good model robustness; (4) hybrid models (D-Lasso and 296 D-ANN) were constructed by combining the ECMWF and ML models; (5) month-by-month drought prediction was performed 297 for the period between 1993 and 2016. Compared to the dynamical models (Météo France, ECMWF, and UKMO), D-ANN 298 not only extended the forecast skill horizons in all regions but offers higher predictive abilities in spring and summer; (6) we 299 quantified the contribution of representative regions extracted using a RF approach, finding that these areas mainly correspond 300 to those areas where the CDC calculates the East Asian Summer/Winter Monsoon indices; and (7) the D-ANN model is more 301 skillful at predicting the onset and persistence phases of drought but performs comparatively poorly during the relief phase.