RainCast: A Rapid Update Rainfall Forecasting System for New Zealand

9 RainCast is a rapid update forecasting system that has been developed to improve short-range rainfall forecasting 10 in New Zealand. This system blends extrapolated nowcast information with multiple forecasts from numerical 11 weather prediction (NWP) models to generate updated rain forecasts every hour. It is demonstrated that RainCast 12 is able to outperform the rainfall forecasts produced from NWP systems out to 24 hours, with the greatest 13 improvement in the first 3-4 hours. The limitations of RainCast are also discussed, along with recommendations 14 on how to further improve the system.


Introduction
38 Short-range quantitative precipitation forecasting (QPF) plays an important role in both meteorological and 39 hydrological risk management. Traditionally QPF can be obtained through either a complex NWP model (e.g., 40 Benjamin et al., 2004Benjamin et al., , 2016Skamarock et al., 2008;Sun et al., 2012;Wilson and Roberts, 2006), or a relatively 41 straightforward statistical approach (e.g., Bowler et al., 2006;Haiden et al., 2010;Seed, 2004). In an operational 42 environment a meteorologist may take the inputs from both methods and, after considering other meteorological 43 factors, adjust them to produce finalized QPF guidance.

45
There have been substantial improvements in NWP forecasts since the 1990's, arising from improved data 46 assimilation and better observations (e.g., Barker

58
However, a state-of-the-art NWP-based rapid cycling system needs significant computational resources which 59 most regional centres cannot afford. Additionally, the challenges of having and maintaining the expertise in 60 observation quality control, forecast model development and data assimilation make the implementation of such 61 a system difficult for many local weather authorities. To establish the capability of producing frequently updating 62 QPF with lower resource requirements compared to NWP, many statistical predication systems have been 63 developed and widely used in regional operational centres. The statistical system usually uses local observations 64 (e.g., radar) in near real time to provide Eulerian or Lagrangian persistence-based nowcasts (e.g., Dixon and 65 Wiener, 1993;Bowler et al., 2006;Browning, 1980). Some of them are also capable of incorporating NWP 66 information to fill the gap between nowcast and NWP forecasts, and also extend the system's useful lead time. A 67 good example for such a statistical system is the Short-Term Ensemble Prediction System (Bowler et al., 2006, 68 Seed, 2003;Seed et al., 2013), also called STEPS, which is an ensemble-based probabilistic precipitation 69 forecasting scheme that blends an extrapolation nowcast with a downscaled NWP forecast.

71
At MetService, STEPS has been adapted to create a rapid update rainfall forecasting system called "RainCast".

72
Multiple NWP models are blended with an extrapolation-based nowcast from STEPS to form a "super ensemble" 73 forecast which is updated every hour. The ensembles are then further enhanced by the neighborhood probability 74 method (e.g., Ebert, 2008;Schwartz et al., 2010), and eventually the system generates probability forecasts out to 24 hours. The system has been implemented and evaluated in New Zealand through a demonstration project since 76 September 2020, and the findings from this project along with the details of the system are presented in this paper.

78
The methods used in RainCast are described in Section 2, followed by a case study of an event which occurred 79 between 00Z and 03Z 14 June 2021 in Section 3. Section 4 provides an objective verification skill score for 80 RainCast and compares it to the NWP models used at MetService. The limitations of RainCast are discussed in 81 Section 5, and a short summary is provided in Section 6. 82

83
RainCast is based on the STEPS algorithm described by Bowler et al. (2007). In RainCast, multiple independent 84 STEPS tasks based on different NWP models are triggered simultaneously to create a cluster of "super ensemble" 85 members every hour. The subjective evaluation from forecasters at MetService suggests that even with the "super 86 ensemble", the spread of ensemble members is not sufficient to handle all rain situations accurately, particularly 87 during severe convection events. To address this, the neighborhood probability method (e.g., Evans et al., 2018) 88 is introduced as a post-processing step in RainCast. In this section, the adapted STEPS algorithm is briefly 89 introduced (Section 2.1) followed by the neighborhood probability method (Section 2.2).

92
The adapted STEPS algorithm is briefly described in this section. The algorithm takes the essential components 93 from the native STEPS approach (e.g., Bowler et al., 2007;Seed et al., 2013) and uses a simplification scheme to 94 run it efficiently in the operational environment.

96
The two main purposes of STEPS are:

97
(1) Combine the extrapolation-based nowcast (e.g., from radar data and usually with a forecast lead time of 98 less than 1-2 hours), with NWP data to provide a smooth transition.

99
(2) Generate probability rainfall forecasts through a stochastic scheme for representing the unrecognizable 100 features from both the extrapolation method and NWP models.

102
Radar data extrapolation at MetService is carried out using the Lagrangian persistence method described by 103 Germann and Zawadzki (2002). This scheme moves radar echoes along the Optical Flow (OF) stationary motion 104 fields: where represents the radar echo to be moved, and the terms and are the OF winds which stay the same 107 during the period of extrapolation.

109
The extrapolation-based nowcast is then blended with NWP forecasts in spectral space, and the spatial scales

138
In this study , is not lead-time dependent due to the assumption that the NWP skill does not change 139 significantly over a short period. On the other hand, , is calculated depending on the lead time from + 1ℎ 140 to + 6ℎ, and after 6 hours it is assumed that the extrapolation-based nowcast has no skill for all cascade levels 141 and therefore , = 0.

143
The weights for NWP, nowcast and noise terms are calculated by: 146 , = 1 − ( , where , and , are the aggregated correlation coefficients for NWP and extrapolation-based nowcasts, 148 respectively. In this paper, the gridded New Zealand national Quantitative Precipitation Estimate (QPE) is 149 considered as "ground truth".

179
At MetService, the NP method is implemented as below: where is the probability of rainfall at the threshold of mm, and , , is the ℎ STEPS ensemble member (at 182 the threshold of mm) from the base model . , is the weight for the base model .

235
The "training" is also performed over the spectral space, as described in the STEPS algorithm (Section 2.1), to 236 merge the extrapolation-based nowcasts with NWP forecasts. Figure 5 shows

241
The relative skills between the nowcast and NWP are also dependent on the spatial scale. For example, at the 1650 242 km scale, the extrapolation-based nowcast had better skill relative to the NWP out to T+2h, then after T+2h its 243 skill declined relative to NWP. Likewise, at the 314 km scale, the extrapolated nowcast had better skill in 244 comparison to the NWP out to T+3h, then afterwards its skill declined relative to the NWP. At a small scale (e.g., 245 11 km), there is little skill from the extrapolation-based nowcast, and it is always lower than the NWP forecast.

247
Operationally, the above training process is carried out with the hourly updated cycle of RainCast. Such a frequent 248 update means that the latest NWP data can be utilized in the parameters' estimation. However, the skill of 249 RainCast could be compromised because the training is performed over the entire domain, and it may not handle 250 well an event in a small area of interest (e.g., an area of most meteorological significance). Moreover, the weights 251 are estimated over the last 12 hours before RainCast's analysis time, which does not necessarily mean these 252 weights will continue to produce the best skills for subsequent forecast hours.

Forecast of rainfall probability vs QPE 256
Considering the contributions from the extrapolation-based nowcast, RainCast is expected to bring obvious 257 additional value to rainfall prediction within the first 1-2 hours. Additionally, with RainCast multiple NWP models 258 are evaluated and blended with the nowcast so improvements over a longer lead time can also be anticipated. In

268
From Figure 6, the forecast probability of rain at a low threshold (e.g., 2.0mm) gives a good match to the QPE. RainCast had a probability of greater than 30% of rain during the period of interest, whereas QPE showed it to be 281 dry. Moreover, little skill was shown when the threshold was increased to 7.5 mm.

283
The above study suggests that for this event, RainCast was generally good at distinguishing between wet and dry 284 areas although there were some overestimates (e.g., where the QPE showed it was dry, RainCast indicated a 285 probability of light or moderate precipitation). However, RainCast struggled to produce satisfactory results for 286 moderate-heavy rainfall intensities (e.g., 15 mm for a 3-hour period), especially when the lead time was longer 287 than 3 hours.

RainCast derived deterministic forecast vs other NWPs 290
To compare RainCast's probabilistic forecasts with the deterministic rain forecasts from other models (i.e., IFS, 291 GFS, UM and WRF) used at MetService, three RainCast percentage probability levels were chosen to function as 292 pseudo "deterministic" rain forecasts. The percentage thresholds extracted from RainCast for this purpose were 293 25%, 50% and 75%.

308
As already noted, in contrast to the QPE there was significant rainfall predicted by many of the models (including 309 RainCast) over eastern Bay of Plenty. A reason for this is that radar coverage of the area is compromised. The 310 distance to the radar station in the western Bay of Plenty means that the low-tropospheric rain cannot be observed 311 well with the QPE underestimating the rain as a result, while the radar station in Hawkes Bay will suffer from 312 beam blocking in the lowest elevations to the north/northwest. Installing and using more rain gauges in the radar 313 rainfall calibration process could provide us with a better QPE map, which is a possibility for future improvement.

315
Similar to Figure 8, when the lead time was 24 hours (Figure 9), IFS and the two RainCast forecasts (>25% and 316 50%) seemed generally to outperform the other models. However, these three approaches under-forecasted rain 317 along the coast of Waikato, although the RainCast (>25%) was slightly better than the other two. There are 318 massive overestimates in the Bay of Plenty compared to the QPE. Some spurious showers were presented in the 319 Manawatu-Wanganui region from IFS and RainCast (>25%), which were successfully eliminated when the 320 probability threshold increased to 50% in RainCast. In contrast, the rest of the models (e.g., GFS, UM and all 321 WRFs) did not give good predictions for a lead time of 24 hours.

323
The above indicates that RainCast was overall able to provide improved forecasts compared to most individual 324 NWP approaches, especially over a short range (e.g., < 3 hours). For this event IFS gave comparable results to 325 RainCast. However, to evaluate the ability of RainCast over a longer period, a more quantitative and less 326 subjective evaluation approach must be adopted, which is described in Section 4.

Objective verification
328 Section 3 provides a subjective evaluation of the event that occurred between 00Z and 03Z on 14 June. However, 329 such an evaluation is not carried out quantitatively, and it is prone to individual biases of interpretation (e.g., 330 Stanski et al., 1989). To objectively validate the accuracy of forecasts, it is more useful to calculate the FSS over 331 a longer period, as presented here for the period 00Z 1 June 2021 to 30 June 2021. During this period numerous 332 rain-producing systems of varied intensities affected the country (the case presented in Section 3 is one of the 333 events that occurred during this period).

335
One of the challenges of applying the FSS is the different spatial resolutions of the models.

344
At a threshold of 0.5mm RainCast (>25%) outperformed all the other models over the entire 24 hours, ( Figure   345 10A and 10B). IFS produced the highest skill score among the global NWP models. UM scored the lowest over 346 the aggregation period with 1 verification grid, however its performance improved when the number of 347 verification grids increased. For example UM performed better than RainCast (>75%) when the grid increased to 348 5, ( Figure 10B). Verification also indicates that RainCast (>75%) gave a low number of both "hits" and "false 349 alarms", and on average its skill is lower than most other forecast approaches at such a small verification threshold.

354
Moreover, the skill of UM increased with increased verification scale. For example, UM performed slightly better 355 than the nz4kmN-UKMO when the number of verification grids increased to 5, especially after T+12 hours.

357
When the verification threshold is increased, forecast skill significantly reduces ( Figure 10E and 10F). For 358 example, the average skill for all forecast approaches dropped from 0.65 (threshold of 0.5mm using 1 verification 359 grid, Figure 10A) to 0.13 (threshold of 15.0mm using 1 verification grid, Figure 9E). Similar to the verification 360 with a threshold of 7.5mm, RainCast (>75%) on average still performed the best from the FSS point of view, 361 especially for the first 3-6 hours. In contrast to the verifications with lower thresholds ( Figure 10A, 10B, 10C and 362 10D), IFS outperformed RainCast (>50%). On the other hand, RainCast (>25%) did not provide satisfactory 363 results due to its overestimates at such a threshold.

365
It is worthwhile to note that the objective verification (FSS) used in this paper does not necessarily reflect the 366 "goodness" or "usefulness" of the forecast. From an operational forecast point of view, "consistency", "quality" 367 and "value" are the three essential metrics which determine the usefulness of a forecast (e.g., Murphy, 1993Murphy, , 368 1995, while objective verification mainly provides a reference for the forecast quality.

370
RainCast frequently uses the latest available observations, and this inevitably means that the user will have to 371 expect more inconsistencies from updated forecasts cycle by cycle. Moreover, since RainCast is a statistical 372 system and is not constrained by physical laws which the NWP-based systems are, this could lead to discrepancies 373 between the output from RainCast and the forecaster's best judgement about the situation, which usually takes 374 many meteorological factors into account.

376
Additionally, the above objective verification is carried out and averaged over the entire New Zealand domain, 377 which may not optimally reflect the situation in an area or areas where there are more meteorological values of 378 significance (e.g., a thunderstorm in a city is likely to have greater impact than one which occurs in an uninhabited 379 area). Therefore, even though the FSS has value in quantifying how RainCast performs, it will take longer in an 380 operational environment to fully realise the benefits that can be gained by decision makers using its forecasts.

382
The skills of RainCast were demonstrated in both Section 3 and Section 4. In this section, the system limitations 383 and potential improvements are discussed.

385
RainCast uses radar data to produce extrapolated nowcasts, and to evaluate the initial conditions of the different 386 NWP models for the current hour (or latest RainCast analysis time). The QPE data, which is used to verify the 387 results from RainCast and other forecast models, are also largely dependent on radar data. Therefore, the quality 388 of radar data plays an essential role in the running of RainCast, and consequently the skill of the system is 389 dependent on it, especially for the first few hours.

391
The topography of New Zealand is complex, especially in the South Island. For example, the Southern Alps run 392 almost along the whole length of the island, which are approximately 800km long and more than 60 km wide, and 393 therefore forms an effective barrier between western and eastern areas of the island. Consequently, New Zealand's 394 complex orography often obstructs low-level radar beams, degrading the performance of radar-based nowcasting 395 (e.g., Foresti and Seed, 2015).

397
Another major uncertainty from the use of radar data is the Z-R relationship. At MetService, a customized Z-R 398 relationship is used to convert radar observed reflectivity to rainfall amounts. This Z-R relationship is updated 399 hourly by regressing the dBZ to gauge-observed rainfall (not presented in this paper). It is demonstrated in an 400 operational environment that this dynamically adjusted Z-R relationship matches the rain gauge records more 401 accurately for New Zealand than the Marshall-Palmer distribution = 200 1.6 (Marshall and Palmer, 1948), but 402 still underestimates many significant events, especially when the reflectivity is greater than 20-25 dBZ.

404
Since a lot of rain gauge data cannot be retrieved in a timely manner to correct it for use in RainCast, there will 405 consequently be underestimates in the extrapolation-based nowcasts, and thus a downgrade in the performance of 406 RainCast, especially for the first 1 to 3 hours. This issue will also affect RainCast's initial conditions and forecast 407 evaluation, since the QPE, which is largely derived from radar data, is here considered as the "ground truth". It is 408 recommended that an improved radar quality control system is needed to further improve the skill of RainCast.

410
As described in Section 2, there are multiple NWP models used in RainCast and they are blended with an 411 extrapolation-based nowcast. The contribution from an individual NWP model is dependent on its skill over a 412 predefined aggregation period (usually it is between T+0h and T+12h where T+0h is the RainCast analysis time).

413
However, such a skill does not necessarily reflect the model's performance for subsequent hours, especially if 414 there are changes to how a weather situation evolves. Moreover, this skill is estimated over the entire RainCast 415 domain, and the domain average skill may not represent a particular area of interest (e.g., airports, urban centres 416 with dense populations and areas exposed to the highest risk of flooding). One of the potential improvements that 417 could be made in the future is to run RainCast over a smaller domain specifically for the area with the most 418 meteorological value.

420
The hourly updated RainCast system at MetService has been described. The algorithm for this system is adapted 421 from STEPS with a neighborhood processing method applied. This system is in the process of being implemented 422 operationally at MetService. Evaluation suggests that, compared to the NWP alone approach, RainCast could 423 improve precipitation forecasts in New Zealand out to 24 hours, particularly for light to moderate rainfall events, 424 with the greatest improvements in the first 1-3 hours.  Regions of the North Island of New Zealand (courtesy to LGNZ: https://www.lgnz.co.nz), and locations of the rain radars (red crosses).

Figure 4
Dynamically adjusted weights at 00Z 14 June 2021 for all available NWP models.

Figure 5
Correlations between the extrapolation-based nowcasts (blue) and NWP (red) for different spatial scales (updated at 00Z 14 June 2021).