Convection-permitting modeling strategies for simulating extreme rainfall events over Southeastern South America

A set of six convection-permitting (CP) domain configurations were implemented to perform 72-hour long simulations of three extreme precipitation events over Southeastern South America (SESA). The goal of the study is to determine the most adequate configuration for reproducing not only the rainfall evolution and intensity, but also the synoptic triggering mechanisms that led to these extreme events, taking into account the trade-off between model performance and computational cost. This study assesses the impact of (1) the horizontal resolution in the CP domain, (2) the horizontal resolution of the driver domain, (3) the size of both CP and driver domains and (4) the nesting strategy (one-step versus two-step nesting). Each simulation was performed with the Weather Research and Forecasting model driven by the ERA-Interim reanalysis. For each event and domain configuration, a 6-member physics ensemble is built, making a total of 36 simulations for each event. No significant differences were found between the 4 km and 2.4 km CP ensembles. Increasing the horizontal resolution of the driver domain from 20 km to 12 km introduced only subtle differences. Increasing the size of the CP domain improved the model performance, probably because of better resolved topography and, hence, better resolved synoptic environment. The results in this study reveal that the one-step nesting CP ensemble at 4 km horizontal resolution covering an area of 29∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$29^\circ$$\end{document}x 21∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$21^\circ$$\end{document} (lon-lat) arises as the optimal domain configuration among these tested to simulate extreme precipitation events over SESA.


Introduction
Southeastern South America (SESA) hosts some of the most extreme convective storms of the planet (Zipser et al. 2006), accounting for more than 70% of the total extended summer precipitation in the region . This certainly makes extreme precipitation events of critical relevance, not only because of the high vulnerability of the population and the socio-economic activities (Vörösmarty et al. 2013), but also given that extreme precipitation has been increasing in frequency and intensity during the last decades (Penalba and Robledo 2010;Cerón et al. 2020;Olmo et al. 2020). Additionally, the frequency and intensity of these events are expected to continue increasing in response to the future global warming, as revealed in several studies based on either global climate models (GCMs) or regional climate models (RCMs) (Chou et al. 2014;Blazquez and Solman 2020).
Though several studies have demonstrated the added value of RCMs in reproducing regional scale phenomena and precipitation-related features over several areas of the world (Torma et al. 2015 for Europe;Falco et al. 2019 andSolman andBlazquez 2019 for South America, among others), RCMs are still deficient in reproducing extreme precipitation features mostly related with the limitations of the convective schemes (Prein et al. 2015). Particularly over SESA, RCMs have deficiencies in reproducing the intensity of extreme rainfall events and the extension of the region 1 3 where the most intense events occur (e.g. Solman et al. 2013;Solman 2016;Solman and Blazquez 2019;Olmo and Bettolli 2021, among others). Hence, there is an urgent need for implementing new modelling strategies to improve the capability of reproducing one of the most important climatic features in the region.
Advances in computational development allowed the implementation of models operating at convection-permitting (CP) resolutions, of the order of a few kilometers (Prein et al. 2015, 2020, andreferences therein). Several studies showed that CP models (CPMs) outperform coarser RCMs in terms of their capacity in capturing the diurnal cycle of convective summer precipitation, the intensity of extreme precipitation events and orography-triggered convection (Matsudo et al. 2015;Prein et al. 2013;Mahoney et al. 2012;Kendon et al. 2012). In early stages, these simulations were developed in the framework of the numerical weather prediction, in which simulations were performed for a few days covering the development of a specific event (e.g. Mahoney et al. 2012). More recently, CP climatic simulations covering a decade or more have been performed over different regions within Europe (e.g. Ban et al. 2020;Berthou et al. 2020;Kendon et al. 2012) and North America (e.g. . These studies highlight the benefits of CPMs in representing extreme precipitation features. CPMs have also been used to explore how extreme rainfall may change under future warming scenarios (e.g. Kendon et al. 2014Kendon et al. , 2016Fosser et al. 2016;Ban et al. 2015;Rasmussen et al. 2020) showing that these models project larger increases in extreme rainfall compared with coarser models. Due to the high computational cost of CP climatic simulations, most CP modelling exercises have been largely based on a single modelling approach. Only recently a coordinated effort based on decade-long multi-CP model simulations started to emerge mostly for domains over Europe (e.g. Coppola et al. 2020;Ban et al. 2021;Pichelli et al. 2021).
Given the computational demand for performing ensemble simulations for a decade or more with CPMs, studies based on multiple CPMs simulating shorter periods, centered on single extreme precipitation events or on a single rainy season, also emerged over several regions of the world, including Europe, North America, Asia and Africa (e.g. Pall et al. 2017;Hibino et al. 2018;Coppola et al. 2020;Li et al. 2019Li et al. , 2018Yang et al. 2017;Matsui et al. 2020). For South America, there are only some preliminary studies, (e.g. Lavín Gullón et al. 2021;Solman et al. 2021), that highlight the benefits of CP models in capturing the main features of individual extreme events and their synoptic forcings. The studies by Lavín Gullón et al. (2021) and  arose as a collaborative effort in the context of a CORDEX Flagship pilot study (https:// cordex. org/) which produced a 6 month-length simulation performed with two CP RCMs, allowing for exploring model uncertainty in capturing selected extreme events, while Solman et al. (2021) assessed the quality of a single model operating at CP resolution in capturing the main features of a collection of extreme precipitation events in a single event approach. The domain of CP simulations in these studies spanned roughly 12 • x12 • degrees with approximately 350 x 350 grid points and were performed in a two-step nesting approach, with the CPM operating at 4 km resolution, nested into a 20 km-resolution domain. The results arising from these experiments showed that though the CP domain covered a large part of the region where extreme events developed, several events fell too close to the lateral boundaries, suggesting that a larger CPM domain may be more adequate. Furthermore, the two-step nesting approach introduced some deviations in the low-level circulation as a result of convective processes occurring close to the boundaries of the CP domain, modulating the main synoptic drivers of the events developing within the CPM domain. As discussed in Brisson et al. (2015) both the nesting strategy and the domain size represent two key elements in the design of CPM simulations, with important consequences in both the quality and the computational cost of the simulations. Though studies assessing the sensitivity of CPM simulations to domain size are quite a few, it is recommended that the domain size should be large enough to allow for a spatial spin-up of barely 150 km . However, an exceedingly large domain can generate deviations from the lateral boundary conditions (LBC) and have undesired effects in the outflow boundary (Prein et al. 2015). Moreover, given the control of the synoptic drivers on the development of organized convection in SESA (Lavín Gullón et al. 2021;Solman et al. 2021, among others), the domain size in CPMs in the region should be large enough so that the forcing mechanisms driving to the occurrence of extreme rainfall events, often modulated by topographic features, are included in the domain.
The nesting strategy is another important source of model uncertainty and impacts on the computational cost of a CPM simulation as well. Most of the studies based on CPM simulations are performed using either three-step nesting in a telescoping mode (e.g. Fosser et al. 2014), two-step nesting (e.g. Ban et al. 2021) or one-step nesting in which the CPM is driven by the reanalysis (e.g. Berthou et al. 2020 Brisson et al. (2015) multiple nestings may deteriorate the quality of a CPM simulation. On the other hand, in a one-step nesting approach, the resolution jump between the model providing the initial and lateral boundary conditions and the CP model impacts on the spatial spin-up, i.e, the distance from the boundaries of the CP domain that should be discarded (Matte et al. 2017). Therefore, a sensitivity analysis to the nesting configuration needs to be performed to find the optimum CPM configuration for the SESA region.
Finally, the horizontal resolution in the CPM simulations is another key aspect to consider. CPM simulations operate at km-scale resolutions, from 4 km (the upper limit for convection permitting simulations) to roughly 1 km (Lucas-Picher et al. 2021). However, the benefits of higher resolution in the CPM simulations are not fully explored. Considering that the choice of the spatial resolution also impacts on the computational cost of a simulation, a sensitivity analysis of the extent to which higher resolution implies better model performance needs to be tackled.
A careful design of CPM simulations, including the domain location, domain size, the spatial resolution and the nesting strategy should be considered to identify the optimal model configuration that allows, on one hand, capturing the extreme precipitation events of a given region and the mechanisms that contribute to trigger, develop and sustain the deep moist organized convection and, on the other hand, accounting for the tradeoff with computational costs. The experimental design described below will allow answering specific questions such as: Does the resolution of the driving model matters? Is there any significant improvement when increasing the resolution of the CPM simulations? Does the nesting strategy affect the quality of the CPM simulation? Does the domain size of the CPM simulation have an impact on the ability of capturing the evolution of an extreme event?
The goal of this study is identifying the optimal modelling strategy for performing computationally feasible CPM simulations able to represent the main features of the extreme precipitation in SESA. Given the key forcing mechanisms leading to the formation of organized convection and subsequent extreme precipitation over SESA, including the northerly wind channeled by the Andes favoring moisture and heat flux together with a midlevel trough located over the Andes and an upper level jet stream with its associated upper level divergence (Rasmussen and Houze 2016, and references therein), it is important to explore the extent to which including these forcing mechanisms within the CPM domain translate in a good representation of extreme events. With this aim, a series of short-term simulations have been performed with the WRF RCM (version 3.9.1; Skamarock et al. 2008) for a variety of domain sizes, horizontal resolutions and nesting strategies for a set of individual extreme precipitation events registered in SESA. The individual event approach ) allows for exploring the capability of the model in capturing the event but also exploring if the model is able to capture the corresponding triggering mechanisms. Identifying a CPM set up with a good performance and with the lowest computational cost is needed before starting longer term simulations in the region.
The manuscript is organized as follows. In Sect. 2 the datasets used for model validation together with the description of the model used, the experiment design and the metrics defined to assess model performance are described. In Sect. 3 the results of the capability of the model in reproducing the selected events for the variety of model configurations and nesting strategies is described. The focus of the evaluation is on the main features of the extreme rainfall event and the synoptic forcing mechanisms. Finally, in Sect. 4 a summary of the main results and a discussion is presented.

Datasets for validation
For precipitation validation, three satellite precipitation estimates are used: the NOAA's Climate Prediction Center Morphing Technique bias-corrected product (Joyce et al. 2004, CMORPH), the Global Precipitation Measurement Mission Integrated Multisatellite Retrievals calibrated precipitation Level-3 Final Run (Huffman et al. 2019, IMERG) and the Multi-Source Weighted-Ensemble Precipitation V2.1 estimate dataset (MSWEP) described in Beck et al. (2018). These datasets are based on integrating different types of satellite and ground station data. Some studies using CMORPH over SESA have reported wet biases in both extreme and weak precipitation (e.g. Salio et al. 2014;Matsudo et al. 2015;Demaria et al. 2011). Similar biases as those found for CMORPH were found for IMERG data by Cui et al. (2019) but in assessing rainfall associated with mesoscale convective systems (MCSs) over the great plains in the USA. MSWEP merges gauge, satellite and reanalysis products and has been shown it has a good performance in monthly and seasonal scales over several regions of the world (Beck et al. 2017). Table 1 lists information on the spatial and temporal resolution of the selected datasets.
The ERA-Interim reanalysis at 0.75 • x0.75 • spatial resolution (Dee et al. 2011) dataset was used to provide the initial and lateral boundary conditions for the simulations described in Sect. 2.3. For evaluating the model performance in terms of the triggering circulation patterns for individual extreme precipitation events, both the ERA-Interim reanalysis and the ERA5 reanalysis (Hersbach et al. 2020) were used. ERA5 was included in the evaluation in order to compare the modelled circulation features against a reanalysis dataset operating at higher resolution (roughly 31 km) and with better performance, compared with driving ERA-Interim reanalysis (Hersbach et al. 2019). Note that by the time ERA5 was released several of the simulations included in this study were already performed with ERA-Interim, therefore, we decided to use ERA-Interim to drive the model for all the simulations in order to keep consistency.

Event selection
Daily rainfall data from the gridded Tropical Rainfall Measurement Mission (TRMM) 3B42 V7 (Huffman et al. 2011) dataset for the period 2000 to 2017 was used to identify the extreme rainfall events. First, the 99th percentile of rainy days (rainfall above 1 mm/day) was computed for every grid point at 0.25 • x0.25 • horizontal resolution) within the SESA region. Extreme precipitation at a given day is defined when precipitation is above the 99th percentile at a given grid point and at adjacent grid points within a region of 1.25 • x1.25 • size (i.e. 5x5 sized running windows in TRMM dataset). These criteria allowed identifying extreme events with a minimum spatial extension accounting for organized convection in the region. After applying these criteria, three extreme events were selected taking into account that the events developed entirely within the smaller CP domain and differ in terms of their location, life cycles and triggering mechanisms. A detailed description of the events can be found in Sect. 3.1. Though the selection of the extreme precipitation events was performed using TRMM, we decided to use higher resolution datasets for assessing model performance.

The WRF model and experimental design
The Weather Research and Forecast model (WRF; Skamarock et al. 2008) version 3.9.1 was used in this study to perform short-term (72-hour) simulations for the selected extreme events. For each event, the model was initialized at 00 UTC of the day before the occurrence of the extreme event, being the first 6 hours of the simulations considered as spin-up time and, therefore, discarded for the analysis. Both initial and lateral boundary conditions (LBC) were provided by the 0.75 • horizontal resolution ERA-Interim reanalysis. All simulations were performed with 39 vertical levels using Mercator projection. The relaxation zone at the boundaries has an extension of around 0.75 • , adjusted for every horizontal resolution for the parent domain (7 gridpoints for the driver domain at 12 km, 5 for the 20 km domains and 20 for the 4 km domains). For each event and for each model configuration, a physics ensemble was built to account for physics uncertainty. The physics ensemble has been built considering the processes that are expected to be more sensitive to the development of deep moist convection including the planetary boundary layer, the microphysics and the shallow cumulus convection. The choice of the cumulus scheme is only included since it may provide a clue on the extent to which the two-step nesting approach may be associated with larger uncertainty compared with the one-step nesting approach. The physics ensemble comprises 6 members, as described in Table 2. Every ensemble member shares the Radiation (Dudhia and RRTM for Longwave and Shortwave Radiation, respectively ;Dudhia 1989;Mlawer et al. 1997) and the Land surface schemes (UNOAH; Tewari et al. 2016). The surface layer scheme is paired with the PBL scheme: for YSU the Rev.MM5 scheme (Jimenez et al. 2012) was used; otherwise the ETA S (Janjic 2002) was chosen.
The three selected events were simulated using a set of different domain configurations, including different nesting strategies (two-steps nesting vs one-step nesting), different domain sizes for both the two domains in the two-step nesting approach and the single domain for the one-step nesting approach, and different spatial resolutions (including the spatial resolution of the outer domain in the two-step nesting approach and the convection-permitting domain). The choice of the domains for assessing both the impact of the domain size and the impact of the nesting strategy is based on a tradeoff between including topographic forcings in the convection-permitting simulations (and therefore enlarging the domain) and the computational cost of the simulations. This experimental setup includes three domain sizes referred to as D1, D2 and D3. For each domain size, the two-step nesting includes the  Lim and Hong (2010), KF Kain (2004), GF Grell and Freitas (2014), MJY Janjic (1994), YSU Hong et al. (2006), MYNN3 Nakanishi and Niino (2009) outer domain driven by the reanalysis, referred to as the driving domain (DR) which provides the lateral boundary conditions to a smaller domain at convection-permitting resolution (CP), corresponding to the CP domain. Accordingly, the difference in the physics choices between DR and CP domains is that the convective scheme is switched on or off, respectively. Figure 1 shows the driving and the convection-permitting-resolution domains for the three domain sizes and Table 3 summarizes the horizontal resolution and number of horizontal gridpoints of the simulations. Considering all combinations displayed in Tables 2 and 3, a total of 36 simulations have been performed for each individual event, with 6 ensemble members for 6 different model set-ups.
Simulations performed at D1 include the DR1 domain with a horizontal resolution of 12 km (roughly 50 K grid points) and the CP1 domain at two horizontal resolutions: 2.4 km (roughly 300 K grid points) and 4 km (roughly 100 K grid points), respectively. These CP simulations allow exploring to which extent the horizontal resolution of the CP domain matters. The CP1 simulations cover an area of approximately 13 • x12 • (longitude-latitude). Simulations performed at D2 differ from D1 in the size of the domain and also in the nesting strategy. For the two-step nesting configuration, not only the DR2 domain extends over a much broader area compared with DR1 allowing for a better representation of the large scale forcings entering the CP domain, but the western boundary of the CP2 simulation is located off the coast of Chile and hence, allowing for a proper interaction between the inflow at that boundary and the complex Andes topography. Moreover, the CP2 domain includes the Sierras de Córdoba over central Argentina where convection usually starts . Given the dominant role of the Andes in modulating the westerly flow and given the control that the Sierras de Córdoba exert on the triggering mechanisms for convective processes to initiate, this domain configuration allows exploring the relevance of including the main topographic features controlling the mechanisms associated with the occurrence of organized convection in SESA. The CP2 domain is implemented only at 4 km horizontal resolution, with roughly 280 K grid points and covers an area of approximately 25 • x16 • . In order to explore the impact of the horizontal resolution in the twostep nesting configuration, DR2 was implemented at two horizontal resolutions, 12 km and 20 km, corresponding to barely 150 K and 55 K grid points. Additionally, one of the physics ensembles performed in the CP2 domain is driven directly by the ERA-Interim reanalysis, hereafter referred to as the one-step nesting configuration. This experiment is designed to explore the nesting strategy, i.e. two-step vs. one-step nesting. This last set up also rises the issue of the spatial spin-up, since the resolution rate between the model providing the lateral boundary conditions (LBC) and the CP domain is close to 18 and hence, a large number of grid points close to the boundaries needs to be discarded to avoid introducing noise due to the resolution jump. Finally, D3 is configured in a one-step nesting approach, similarly as one of the experiments in the D2 set up, but with a larger domain, spanning 29 • x21 • . The aim of exploring this domain is to enlarge the CP domain in order to include the main topographic forcings within the CP domain. Note that the CP3 domain is not only larger than the CP2 configured in the one-step nesting approach, but its northern boundary is located up to 15 • S , allowing for better capturing  the low-level northerly moisture flow -a key ingredient to develop deep convection. In summary, three convection-permitting domains are defined: for the two two-step nesting simulations, one of the CP domains does not include the Andes and one does; for the one-step nesting approach, the Andes lye within the model domain but the northern and eastern boundaries are both shifted in order to avoid the events developing too close to the boundaries and therefore, having too much influence from the lateral boundary conditions. The experimental set up described above is also framed considering the tradeoffs between domain configuration, including domain size, resolution and nesting strategy, and computational costs. It is also worth recalling that the larger the number of nesting steps the more the computational cost of the simulations.

Metrics and analysis
The focus of this study is on evaluating the extent to which different model configurations capture extreme precipitation events together with their triggering forcing mechanisms. Hence, the analysis is centered on the main features of three precipitation events in terms of their intensity, location and temporal evolution and on the low-level circulation patterns, namely, the meridional component of the wind and the geopotential field at 850 hPa. Given the dominant role of the moisture flux convergence on the development of the extreme events over SESA (Lavín Gullón et al. 2021;Solman et al. 2021), the vertically integrated moisture flux for each of the selected events has also been evaluated. The analysis is focused on the low-level circulation since it is at the lower levels of the atmosphere where the regional forcings associated with topographical features may have the largest impact.
The skill of the models in representing the precipitation events is assessed with the Fractional Skill Score (FSS; Roberts and Lean 2008). The FSS is a metric based on a neighborhood approach that measures how the skill of the simulation varies with the spatial scale at which simulations and observations are being compared. The FSS compares the observed and modeled fraction of grid points with precipitation above a certain threshold within a running square domain of varying size, ranging from a single grid cell to twice the size of the simulation domain (to cover the whole simulation domain regardless of where the running square is centered). The FSS ranges from 0 to 1, being 1 a perfect skill. Roberts and Lean (2008) defined a critical FSS that indicates the minimum value of FSS that should be reached for a skillful prediction. The FSS is generally monotonically increasing with the spatial scale, hence, the smaller the scale at which the FSS equals the critical FSS, the better the model performance, indicating that the model is able to capture the fractional precipitation occurring at smaller scales. The minimum spatial scale is defined as the scale at which the FSS reaches the skillful threshold.
In order to account for the timing and intensity of the precipitation events, the FSS was calculated for each member of the ensemble against every observational dataset every 3 hours during the whole event. The minimum spatial scale at which the model is skillful is computed at each timestep. If the model is not skilful, it won't reach a minimum spatial scale. The precipitation threshold selected for the calculations of the FSS is the 95th spatial percentile of the observed 3-hourly precipitation. Therefore, for each event, each ensemble member and each observational dataset, the minimum spatial scale was computed, and the distribution of the minimum skillful scale for every set of CPM simulation is represented in a box-plot.
Due to the FSS compares observed and simulated precipitation, every simulation and observational dataset was bilinearly interpolated onto a common grid of 0.1 • x0.1 • . This resolution corresponds to the lowest resolution among the observational datasets. Thus, an FSS is calculated for every running square from 0.1 • size (gridpoint size) to 26.5 • size (twice the size of the common domain of the CP simulations). This interpolation implies an upscaling of the CP simulations and, therefore, an apparent loss in high resolution information but it is still expected that high resolution information is transferred into the upscaled lower resolution domain (Torma et al. 2015;Fantini et al. 2018). This interpolation was also applied to adequately compare the precipitation fields between the simulations and the observational datasets. Similarly, for the synoptic fields, a bilinear interpolation to the ERA-Interim grid ( 0.75 • x 0.75 • ) was performed.

Results
In the following an overview of the selected extreme precipitation events is presented. Then, the performance of the ensemble mean CPM simulations corresponding to the 6 model configurations (listed in Table 2) is assessed for each event in terms of their capability in reproducing the spatial distribution of the maximum 3-hourly accumulated precipitation, the temporal evolution of the 3-hourly accumulated precipitation during the onset, mature stage, and decay together with the associated circulation features. Table 4 lists the dates of the selected extreme events together with some specific features, such as extreme spatial percentiles of the daily precipitation and the area covered with precipitation above 50 mm/day as depicted by each of the interpolated datasets listed in Table 1. Case 1, occurring on March 12th, 2005, is the most intense event as denoted by the spatial 99th percentile of the daily rainfall ranging between 141 mm/day to 210 mm/day, depending on the dataset. Inspection of the 3-hourly precipitation of all datasets (not shown) revealed that the event initiated at around 18 LT (local time; UTC-3) on March 11th east of the Andes Mountain range, progressed eastwards reaching the peak precipitation at 06 LT on March 12th and further propagating northeastward while acquiring a northeast-southwest band shape. This is the typical behavior of MCS in SESA according to the literature Matsudo and Salio 2011;Romatschke and Houze 2013). Figure 2 shows the 3-hourly peak precipitation for the Case 1 event. Satellite estimates (Fig. 2 a, d and g) display two maximums, with the largest peak rainfall ranging from 45 mm to above 100 mm centered on 31 • S 61 • W and a secondary peak located to the northwest. The discrepancies among datasets in terms of the intensity of the peaks are remarkable.

Description of the selected events
Case 2 started at 18 LT on November 9th 2015, progressed eastwards reaching the peak precipitation at 06 LT on November 10th over northeastern Argentina close to the border with Brazil and propagated further eastward during its decline (not shown). The spatial 99th percentile of the daily rainfall during the day of maximum rainfall ranges from 111 to 146 mm/day, as depicted by the 3 datasets (Table 4). Figure 3 shows that the peak 3-hourly rainfall (e.g. above 60 mm) is very localized over a small region centered at roughly 28.5 • S 56 • W, though the area with heavy rainfall The three top rows depict the spatial percentiles of the daily precipitation corresponding to the 90th, 95th and 99th (mm/day) and the bottom row displays the approximate extension of the area (x10 4 km 2 ) where the daily precipitation is above 50 mm/day. Daily precipitation is computed accumulating 3-hourly precipitation rates from 00:00 LT to 21:00 LT (03:00 to 00:00 UTC)

Case1
Case2 Case3 extends over a much broader region, as depicted by every dataset. A less intense precipitation system is also apparent to the southwest of the main peak. The agreement in the spatial distribution of rainfall at the time of the maxima among the three observational datasets is apparent. Case 3 started during nighttime hours, around 00 and 03 LT on October 24th 2016. The peak precipitation was reached at 09 LT according to CMORPH dataset and at 12 LT according to IMERG and MSWEP, being a more explosive event compared with Cases 1 and 2 (not shown). The maximum 3-hourly rainfall for this event displayed in Fig. 4 depicts a localized area with peak intensity centered around 29 • S, 58.5 • W and a secondary peak further to the west. Additionally, a secondary system developing over the southwestern corner of the domain is also apparent. The observations encompass a wide range of rainfall intensities at the time of the maximum peak, but every dataset agrees on the location and spatial extent of the precipitation maximum. This event is characterized by a much more localized system which may represent a challenge in terms of model performance. A relevant feature of this event is that heavy rainfall started at night hours while the other two events started in the afternoon and reached the peak during early morning hours, in agreement with the typical diurnal cycle of extreme rainfall events developing in the region .
The discrepancies in the precipitation intensity among datasets for the three extreme events emerge clearly from Table 4 and Figs. 2, 3 and 4, with MSWEP showing the lowest values. The spread in the precipitation amount among datasets increases with the spatial percentile, revealing the difficulties of the satellite estimates to reproduce extreme precipitation. The observational uncertainty associated with extreme rainfall events arises clearly in this analysis. It represents a serious obstacle for assessing model performance mainly if the focus is on extreme events, given that the assessment of model performance will strongly depend on the observational dataset used.

Simulated precipitation
A comparison of the ensemble mean peak 3-hourly accumulated precipitation between the set of CP experiments (Table 3) and observations for each individual event is displayed in Figs. 2, 3 and 4, respectively.
For Case 1, Fig. 2 shows that all CP simulations broadly capture the event, though there are some differences among the set of model configurations. First, all simulations display the core of the event slightly shifted towards the northwest compared with the observations. The maximum intensity is similarly represented by every CP simulation, but it is difficult to identify the extent to which the simulations are overestimating or underestimating the peak given that the maximum intensity from the set of observations is quite dissimilar. If we compare the two two-step nesting simulations for the D1 domain, the 2.4 km resolution ensemble CP1-2.4/  (Fig. 2b, c), the differences are quite subtle, suggesting that for this individual event the resolution of the CP model does not have a strong impact on the quality of simulated precipitation. Enlarging the domains (D2 vs D1) does not translate into any significant impact (Fig. 2c vs e), nor does the horizontal resolution of the driving domain ( Fig. 2e vs  f). The location and spatial extension of the system is sensitive to the nesting strategy ( Fig. 2e vs h) though only for the larger CP domain (Fig. 2i) the system seems to be better reproduced. In this CP ensemble (CP3-4/noDR) the system extends to the southeast, in a better agreement with the observations, and the unrealistic rainfall close to the western boundary that is present in all the other CPM configurations, is not simulated.
For Case 2 (Fig. 3) every simulation captures the core of the peak precipitation in terms of both location and intensity, though the spatial extension of the system is restricted to a smaller area compared with the observations. No simulation reproduces the broader area with intense rainfall extending over the central part of the domain. The event develops too close to the eastern boundary in the CP domains D1 and D2, suggesting that these domains may be too small for simulating extreme events in SESA that may develop over a broader area extending further east. Moreover, the one-step nesting ensemble over domain D3 (Fig. 3i), CP3-4/noDR, is the only simulation reproducing the secondary maximum rainfall over southern Brazil, though it still fails in capturing the overall rainfall spatial distribution. As was noted for Case 1, no remarkable differences are found between the 2.4 km and the 4 km resolution CP1 ensembles (Fig. 3b vs c). The peak rainfall seems to cover a smaller area for the CP2 ensembles compared with CP1, suggesting that enlarging the CP domain towards the west does not have a systematic impact on improving the quality of the simulated rainfall. Additionally, the one-step nesting over the D2 domain (CP2-4/ noDR) does not display any important difference compared with the two-step nesting simulation (either CP2-4/DR2-12 or CP2-4/DR2-20).
For Case 3 (Fig. 4) every CP simulation fails in reproducing the core of the event at the right location, but a center of rainfall above 15 mm located further to the west, between 64.5 • and 61.5 • W, is apparent. Only the one-step nesting CP simulation over the D2 domain, CP2-4/noDR (Fig. 4h) reproduces a rainfall center in a better agreement with the observations. As for Cases 1 and 2, no differences can be highlighted between CP simulations at different resolutions. Results are sensitive to the size of the domain (e.g. D1 vs D2) though no improvement in the quality of the CP simulations is apparent in the two-step nesting experiments when a larger domain is used (middle row vs top row).
In order to highlight the diversity of the results arising from the members of the physics ensemble, the Taylor diagram of Fig. 5 displays the spatial correlation, normalized standard deviation and normalized centered root mean square error of the maximum 3-hourly precipitation field for each ensemble member and for each event. The Taylor diagram is a frequently used metric to measure performance of simulations against a reference. For Fig. 5 we selected MSWEP dataset as reference, but similar results arise with CMORPH and IMERG datasets as reference. This is because, as shown in Figs. 2, 3 and 4, the spatial distribution of the estimated precipitation is very similar between the observational datasets, differing mainly in the intensity. The Taylor diagram in Fig. 5 not only shows how the different members of the ensemble represented the maximum 3-hourly precipitation but also reveals the spread among them. It is important to highlight that capturing the 3-hourly precipitation field is very ambitious for the simulations. For Case 1 the spread is low in every CP ensemble, as every simulation is clustered around 1.5 standard deviation with spatial correlations ranging from 0.25 and 0.6. For this case the CP3-4/noDR simulations reach the highest correlation coefficients and the CP1-4/DR1-12 the lowest. For Case 2 CP2 simulations (red, green and purple dots) show better performance compared with CP1 and CP3 simulations with dots clustered around higher spatial correlation coefficients compared with those corresponding to CP1 and CP3. Moreover, CP1 ensembles (blue and orange dots) seem to have a larger spread compared with the other CP ensembles. As noted in the previous analysis, Case 3 was poorly represented by all CP simulations. Spatial correlation coefficients are all below 0.2 and for some of the ensemble members negative values were found (not shown). Overall, the capability of the CP ensembles in capturing the 3-hourly maximum precipitation seems to be case dependent. However, from Fig. 5 it can be noted that the CP2-4/noDR and CP3-4/noDR display lower spreads among ensemble members compared with the other ensembles.
The variety of skills at representing the peak precipitation of the selected events analyzed may be due to either the CP simulations fail in reproducing the event or they fail in reproducing its temporal evolution. In order to explore the capability of the CP simulations in reproducing the temporal evolution of the systems, Fig. 6 displays the time-series of the spatially averaged 3-hourly accumulated rainfall. Large differences are found among the three different observational datasets, with CMORPH systematically exceeding IMERG Fig. 5 Spatial Taylor diagram of the maximum 3-hourly precipitation for each event. The MSWEP dataset is considered the reference. The CMORPH and IMERG datasets have also been included and MSWEP datasets. Though there is an overall agreement on the temporal evolution of each event among the three datasets, CMORPH depicts some inconsistencies on the time of the day of the precipitation peak, compared with IMERG and MSWEP. It is worth to recall that both IMERG and MSWEP are blended with station data, hence, they probably agree better with station data in terms of the timing of the events than CMORPH. For Cases 1 and 2 all CP simulations adequately capture the onset, the peak and the decay of the three extreme events, with some subtle differences in the intensity of the maximum peak and to a lesser extent in the time of the day when the maximum rainfall is simulated. Overall, no systematic differences among the 6 CP ensembles are apparent. For Case 1, the development of a secondary event after the decay of the main event is apparent from the observations (with the IMERG dataset reaching peak precipitation 3 hours earlier than the rest of the observational datasets). Only some of the ensemble members for the twoway nesting strategy display a much less intense secondary peak three hours later. After the decay of the main event in Case 2, there is also a secondary system clearly identified in the observations from 00 to 18 LT on November 11th 2015. CP simulations can capture the evolution of this secondary system but there are large differences in terms of the timing of the peak intensity and in the rainfall intensity. Only the two-step nesting CPs over the D2 domain adequately captures the temporal evolution and the intensity of this event. For Case 3, all CP simulations underestimate the rainfall amount all along the lifecycle of the event and fail in capturing the time of the maximum peak. However, the evolution of the rainy system developing during the following day is well reproduced. Note that this event has a much smaller spatial scale compared with Cases 1 and 2 and hence, it may be more difficult to be adequately simulated.
The evaluation based on the spatial distribution of the peak rainfall and on the temporal evolution of the events is limited due to part of the information characterizing the extreme events may be missing. It is important to assess the capability of the CP simulations in capturing the timing, the intensity and the location where precipitation occurs. Accordingly, and to provide a more quantitative measure of the quality of the set of CP simulations, the FSS is evaluated. The minimum spatial scale at which the FSS reaches a skillful value is obtained along the life cycle of each event (i.e. from the onset to the following 24 hours) for each ensemble member, computed against each of the three observational Fig. 6 Time-series of the 3-hourly accumulated rainfall averaged over the target area for Case 1 (left), Case 2 (right) and Case 3 (center). The ensemble mean (lines) and the ensemble spread among members of CP simulations (shaded) are displayed for the two-step nesting CP simulations at the CP1 domain (top), the two-step nesting CP simulations at the CP2 domain (middle) and the two one-step nesting simulations over CP2 and CP3 (bottom), respectively datasets. Figure 7 shows a box-plot of the minimum skillful spatial scale at which each simulation captures the fractional 3-hourly precipitation for each event. The lower the spatial scale, the better the CPM performance. The percentage of times for which the simulation does not reach the minimum spatial scale is shown in the upper part of each boxplot.
As noted in the previous analysis, increasing the spatial resolution of the CP ensemble does not always imply better model performance. The 2.4 km resolution CP1 ensemble does not display systematic smaller skillful spatial scales compared with the 4 km resolution CP1 ensemble nor displays systematic reduction of the ensemble spread. Moreover, the frequency at which the FSS does not reach a minimum skillful spatial scale is higher for the higher resolution CP1 ensemble. Hence, increasing the spatial resolution of the CP simulation does not necessarily implies better model performance and/or less uncertainty. Accordingly, results for these three single events indicate that no added value is found when increasing the horizontal resolution from 4 to 2.4 km. This is an important outcome given the impact of increasing resolution on the computational cost of the simulations. The spatial resolution of the driving domain does not impact on the capability of the CP ensemble in capturing the peak of the events. This result arises when comparing the 4 km CP ensemble nested in either the 12 km or the 20 km driving model over the domain D2 (CP2-4/DR2-12) vs CP2-4/DR2-20). Again, this outcome has important implications in terms of computational costs. However, when the size of the domain is enlarged (e.g., the 4 km CP ensemble over D1 vs the 4 km CP ensemble over D2, blue and orange boxes, respectively), the CP ensemble displays smaller skillful spatial scales, suggesting that the domain size of the CP simulation does have an impact on the capability of the model in capturing the events. This can be observed for Case 1 for which the percentiles of the distributions of the D2 simulations are lower than the same percentiles of the D1 distribution (with the exception of the 95th percentile). Case 2 shows that only the lower half of the distributions of the D2 simulations reach lower skillful scales and Case 3 shows similar distributions for both D1 and D2 simulations. But it should be considered the percentage of critical FSS missed by the D1 simulations, which are systematically more than the rest of the simulations. The better performance of D2 simulations against D1 is expected, given that the D2 nested simulations should better capture the triggering mechanisms associated with the topographic forcing compared with the CP ensemble in the D1 domain.
Concerning the nesting strategy, the D2 one-step nesting simulation (CP2-4/noDR) displays an improvement compared with the D2 two-step nesting for Cases 2 and 3 in Fig. 7 Boxplot of the minimum spatial scale at which each CP ensemble is skillful. The horizontal axis indicates the CP ensembles, and the vertical axis indicates the minimum spatial scale (in degrees). The minimum spatial scale ranges from 0.1 • to 20.1 • . The boxes and whiskers are built from the information derived from the FSS computed every 3 hours against each of the three observational datasets and for each of the 6 members of the ensemble. The numbers in the upper part of each boxplot represent the percentage of times in which the FSS score does not reach a skillful value. The brown and green lines indicate the 50th percentile and the mean value of the minimum spatial scales, respectively. The boxes delimit the 25th and 75th percentiles and the whiskers denote the 5th and 95th percentiles. The colors help visualize the nesting strategy: blue for CP1 two-step nested simulations, orange for CP2 two-step nested simulations and red for one-step nested simulations terms of reducing both the skillful spatial scales and the percentage of missing scales. Enlarging the CP domain in the one-step nesting approach (CP2-4/noDR vs CP3-4/noDR) translates in a better model performance, only for Case 1. For Cases 2 and 3 the CP3-4/noDR simulations are more skillful only compared with D1 simulations. With respect to D2 two-step nesting simulations, the CP3-4 simulation does not show a clear improvement. Overall, the analysis based on the minimum skillful scale seems to be not conclusive given that the results are highly dependent on the event. However, it is important to keep in mind that both the skill and the computational cost of the simulations are both relevant in order to make a decision for performing longer simulations.

Synoptic forcings
As already mentioned, extreme precipitation events in SESA are mostly associated with organized deep moist convection . Isolated convective cells usually initiate during the afternoon hours on the east side of the Andes and the Sierras de Córdoba hills located over central Argentina. These systems propagate further eastward and develop into organized convective systems in the presence of the South American low-level Jet (SALLJ). The SALLJ is a northerly wind that advects warm moist air from the Amazon forest into subtropical latitudes (Salio et al. 2007). The SALLJ penetration into higher latitudes together with a mid-to-upper-level subsidence in the lee side of the Andes, that caps the low-level flow inhibiting the convection close to the Andes, represent the key ingredients for upscaled convective systems developing over SESA. Furthermore, the cyclone formation on the lee side of the Andes is another important feature of convection initiation, enhancing northerly flow and favoring a strong moisture flux convergence that fuels long-lived MSCs over SESA Salio et al. 2007). Hence, the evolution of MCSs in SESA is highly influenced by the SALLJ and by the low-level circulation pattern. Therefore, we focus on evaluating how the main synoptic forcing mechanisms described above are captured by the set of CP ensembles. Due to the CP ensembles being short-term simulations lasting for 72-hours, it is not expected that the mid and upper-level circulation features diverge from the driving reanalysis, hence, the analysis is focused on the low-level circulation features at 850 hPa. As shown in Solman et al. (2021), the location of the exit region of the low-level jet and of the moisture flux convergence largely determines the area where the maximum precipitation occurs during the subsequent hours. Figure 8 displays the 850 hPa geopotential height together with the vertically integrated moisture flux 3 hours before the maximum rainfall rate occurs for Case 1, as depicted by the reanalysis and the CP ensemble mean simulations. Cases 2 and 3 can be found in the Supplementary Information (Figs. S1 and S2). It is interesting to highlight that for the three events, the ERA5 reanalysis depict important differences compared with the ERA-Interim dataset in terms of the intensity and southward penetration of the low-level jet and in the spatial configuration of the moisture flux.
The ERA5 reanalysis (Fig. 8d) shows a strong southward meridional flow with intensities above 15 m/s, which provides the moist and warm conditions over the target region. The moisture flux penetrates southward with the area of maximum convergence located over the area where the maximum rainfall core is observed (Fig. 2), roughly around 31 • S 61.5 • W. The ERA-Interim reanalysis (Fig. 8a) shows a slightly less intense meridional wind and moisture flux. The CP ensembles agree better with the ERA5 rather than with the ERA-Interim reanalysis, even though these are driven by the latter. This suggests that the horizontal resolution may have a role in how the low-level flow is represented, mostly when the flow configuration is modulated by the topography, as occurs here. For every CP ensemble, the exit region of the low-level jet is collocated with the core of the rainfall event, in agreement with Lavín Gullón et al. (2021) and Solman et al. (2021). Inspection of the same fields in the driving domains (with coarser resolution and parameterized convection) reveals that the moisture penetration was underestimated, resulting in precipitation patterns located further north with respect to the CP simulations and the observations (not shown). The two CP ensembles for the D1 domain, CP1-2.4/DR1-12 and CP1-4/DR1-12, display quite similar results. This is expected given that the CP domain extends over a 12 • x 13 • (lon-lat) region mostly over a flat area, being the western boundary too close to the Sierras de Córdoba, hence, the topography either at 2.4 km or 4 km resolution, is not able to modulate the flow within the CP domain. Enlarging the CP domain towards the west (the D2 domain), where the main topographic systems that exert an impact on the initiation of convection are included, does not display any strong impact on the simulated circulation either. Similarly, the two two-step nesting simulations for the D2 domain, CP2-4/DR2-12 and CP2-4/DR2-20, are very similar, independently of the resolution of the driving domain. The one-step nesting CP ensembles in any of the two domains, CP2-4/noDR and CP3-4/noDR, display a better agreement with the ERA5 reanalysis in terms of the southward penetration of the low level jet and in the location of its exit region, being the CP ensemble over the D3 domain (CP3-4/noDR) the one that better captures the moisture flux convergence, the location of the cyclonic circulation close to the western boundary and the location of the exit region of the low level jet. This is consistent with the better agreement with the observations of the CP3-4/noDR ensemble in terms of the location of the core of the rainfall event mentioned previously. Figure 9 displays the Taylor diagram for the 850 hPa geopotential height and for the vertically integrated moisture flux for each member of every CP simulation compared against ERA5 reanalysis for Case 1 (the same Taylor diagram against the ERA-Interim reanalysis can be found in the Supplementary Information, Figure S3). Note that the spread among ensemble members for each CP ensemble is generally small, with all simulations much alike each other. Note also that the ensemble members from the CP3-4/noDR lie closer to the reference dataset, suggesting that this ensemble arises as the one with the largest spatial correlation coefficient and the smaller root mean square error compared with the reference dataset for both the geopotential height at 850 hPa and the vertically integrated moisture flux. On the other hand the CP1-4/ DR1-12 ensemble arises as the simulation with the lowest spatial correlation coefficient against the reference dataset.
Inspection of the 850 hPa geopotential field and the vertically integrated moisture flux fields displayed in Figure S1 and the corresponding Taylor diagram displayed in Fig. 10 for Case 2 suggests that the simulations are less skilful in reproducing the spatial distribution of the vertically integrated moisture flux compared with Case 1. As can be noted from Figure S1, for every CP ensemble the moisture flux is deflected eastwards and the area of moisture flux convergence is located further to the north compared to ERA5, consistent with the location of the rainfall core. The Taylor diagram for the vertically integrated moisture flux shows that all simulations are clustered together, suggesting that no remarkable differences can be identified among the set of CP ensembles. Compared against ERA-Interim reanalysis (Fig. S3), some members of the one-step nesting simulations are slightly closer to the reference than the rest of CP simulations. Moreover, the Taylor diagram for the geopotential Fig. 8 Vertically integrated moisture flux (vectors), 850 hPa geopotential height (black contours) and intensity of the meridional component of the wind at 850 hPa (shaded) 3 hours before the peak precipitation occurs for Case 1 (2005-03-12 03 LT). Purple lines display the 20 mm contour of the maximum 6-hourly accumulated precipitation. The two top left panels (a, d) are from ERA-Interim and ERA5, respectively. The rest of the panels display the ensemble mean for each of the 6 CP ensembles height shows that the two one-step nesting CP simulations (CP2-4/noDR and CP3-4/noDR) arise as those much alike the ERA5 dataset, being the rest of the CP ensembles clustered around lower values of spatial correlation coefficients. Every simulation shows lower spatial standard deviation than the reference, probably due to their inability to reproduce the cyclonic circulation to the west of the rainfall maximum (Fig. S1). This circulation is important due to it enhances the northwesterly flow and modulates the moisture flux convergence. Overall it is apparent that all CP ensemble simulations depict very similar circulation patterns, which explains their agreement in the (mis)representation of the extreme rainfall event. Moreover, from Figure S1 it is apparent that the location of the rainfall event as depicted by the ERA5 reanalysis is closer to the observations compared to that of the ERA-Interim reanalysis. Therefore, it is suggested that the performance of the CP simulations in capturing this event seems to be strongly conditioned by the quality of the driving reanalysis.
For Case 3 the Taylor diagram (Fig. 11) shows that all CP simulations reproduced the geopotential height at 850 hPa field very well compared against the reference reanalysis with correlations above 0.9. On the other hand, the spatial distribution of the vertically integrated moisture flux seems to be not so well captured, as noted from the range of spatial correlation coefficients ranging approximately from 0.5 to 0.8. Taking into account the relevance of the vertically integrated moisture flux on the capability of the models in reproducing the extreme event, as remarked by Lavín Gullón et al. (2021) and Solman et al. (2021), the misrepresentation of this field may be a possible explanation of the misrepresentation of the event. Moreover, it can be noted that CP1 simulations have generally lower values of correlation and standard deviation than the rest of the simulations. In fact, the ensemble mean of the CP1 simulations (Fig. S2) shows that the area of maximum northerly wind and the area of moisture flux convergence is shifted to the west compared with the ERA5 reanalysis. It can also be noted that enlarging the domain of the CP simulations impacts on the simulated circulation. This can be also observed in the spatial distribution of the intensity of the meridional component of the wind and in the moisture flux in Fig. S2. Finally, it is interesting to note that CP2-4/noDR simulations captured the vertically integrated moisture flux better, which may in part explain why the FSS (Figure 7) analysis showed lower spatial scales for that simulation in this event.
Overall, it is apparent that every CP ensemble simulates the core of the rainfall event at the exit region of the lowlevel jet, where the moisture flux converges. This behavior suggests that, if the CP simulation is able to capture the synoptic scale forcings that provide the environmental conditions for deep moist convection to develop, they are also able to capture the rainfall event. The CP simulations over the D2 and the D3 domains, either the two-step nesting or the one-step nesting CP ensembles, arise as those with better performance in terms of simulating both the synoptic-scale forcings and the extreme precipitation events, compared with the smaller CP domain (D1).
Concerning the nesting strategy, it is important to recall that multiple nestings may deteriorate the quality of the CP simulations due to errors inherited from one driving domain to another , besides of being computationally more expensive. On the other hand, the resolution jump between the model providing the LBCs and the driven CPM, which is close to 18 for the CP2-4/noDR and CP3-4/ noDR simulations, determines the spin-up distance within the nested domain, i.e., the extension of the area close to the lateral boundaries that should be discarded. As discussed by Matte et al. (2017), the spatial spin-up depends on the spatial resolution of the LBC. These authors suggested that the larger the resolution jump the larger the number of grid points that should be discarded from the analysis in the nested domain. Hence, considering the tradeoffs between the one-step versus the two-steps nestings, and in order to objectively assess the convenience of one strategy over the other, the RMSE of the geopotential height at 850 hPa computed against the reanalysis has been evaluated for the CP ensembles along the length of the simulated period for each case. The RMSE has been computed within the common CP2 domain considering the DR2-12, CP2-4/DR2-12, CP2-4/noDR and CP3-4/noDR ensemble members against both ERA-Interim and ERA5. A region extending 2.25 • away from the boundary of the CP2 domain has been discarded to account for the spatial spin-up, even though the estimates by Matte et al. (2017) suggest an extension of approximately 4.5 • which, given the size of the CP2 domain, would represent discarding a large part of the CP domain. Each simulation has been bilinearly interpolated to the grid of the reanalysis to which it is being compared ( 0.25 • x0.25 • for ERA5 and 0.75 • x0.75 • for ERAI).  displays the RMSE against ERA5, as similar results were obtained for ERA-Interim (Fig. S4), with slightly higher values of RMSE. All CP simulations display smaller errors compared with the DR2 simulations, indicating the added value of the CP simulations. The one-step nesting simulations depict smaller RMSEs compared with the two-step nesting CP simulation only for Case 1. Case 2 shows similar RMSE for both the onestep and two-step nesting simulations, while Case 3 shows bigger errors at the time of the maximum precipitation for the CP3 one-step nesting simulations. Nevertheless, during the late stages of the events, the RMSE of the one-step nesting simulations are smaller compared with the two-step nested simulation for the three events. This behavior may have important consequences for longer, climatic scale CP simulations. This result shows that the one-step nesting CP simulations do not deteriorate the quality of the simulated fields compared with the two-step nesting, independently of the size of the CP domain. Regarding the one-step nested simulations, the CP3-4/noDR ensemble displays values of RMSEs similar to the CP2-4/noDR. There is no evidence of systematic improvement for one against the other regarding these three events. Recall that the larger the domain size (in this case CP3 vs CP2), the larger the freedom of the CP simulation in developing a circulation that may differ from the circulation provided at the lateral boundaries of the domain by the reanalysis so the RMSE suggests that a bigger CP domain did not prejudice the quality of the simulated circulation.

Summary and conclusions
This study explores a variety of modeling strategies for performing convection-permitting simulations over Southeastern South America. The aim of the study is to identify the most suitable model configuration for reproducing the evolution of extreme rainfall events and their triggering mechanisms at a feasible computational cost. For that purpose, a set of 6 experiments focused on exploring the choice of the nesting strategy, the spatial resolution and the domain size were performed based on short-term simulations lasting 72 hours using the WRF 3.9.1 model for three extreme precipitation events. For each of the selected cases, a 6-member physics ensemble was built in order to account for physics uncertainty. All simulations were driven by the ERA-Interim reanalysis dataset.
The evaluation of the set of experiments is focused on the temporal evolution and spatial distribution of rainfall against a set of satellite-based precipitation observations. Including a set of observations allows acknowledging the observational uncertainty which is particularly large for extreme events in the target region. Additionally, given the strong control of the synoptic circulation during the early stages of the development of the events, the synoptic forcings were also evaluated against two different reanalyses: ERA-Interim and ERA5. The analysis based on individual events contributes to better understanding the mechanisms that may explain model deficiencies.
Several research questions drove this study. The question concerning the need of improving the resolution of the convection-permitting simulations was addressed by comparing the two-step nesting CP ensemble at 2.4 km resolution against a 4 km resolution for a CP domain spanning an area of 13 • x12 • (lon-lat). No significant differences were found between these two sets of experiments in terms of their capability in reproducing the main features of the extreme events together with the structure of the meridional flow at 850 hPa, the moisture flux convergence and the lowlevel circulation patterns. Additionally, the two-step nesting Fig. 12 Temporal evolution of the RMSE (m) computed for the 850 hPa geopotential height against the ERA5 reanalysis for Cases 1, 2 and 3 (left, central and right panels, respectively) for each ensemble member for DR2-12 (blue), CP2-4/DR2-12 (orange), CP2-4/noDR (green) and CP3-4/noDR (red). Vertical dashed lines denote the time of the maximum rainfall for each case. Times are displayed in LT 2.4 km resolution CP ensemble required 3.5 times the computational resources of the two-step nesting 4 km resolution one over the same CP domain. Hence, considering the balance between the computational cost and the quality of this pair of simulations, it is concluded that CP simulations operating at 4 km resolution could be considered as a reasonable resolution choice as no additional added value is apparent at higher resolution. To explore whether the resolution of the driving domain in a two-step nesting approach matters, two sets of experiments were compared in which the resolution of the driving domain was 12 km and 20 km, respectively. It was found that increasing the resolution of the driver domain introduces only subtle differences in the performance of the CP ensembles, shown in terms of the smaller minimum spatial skillful scales found for the CP2-4/DR2-12 compared with the CP2-4/DR2-20 ensembles. This may be due to the fact that no relevant differences arise when the horizontal resolution of the driving domain increases from 20 to 12 km. A remarkable improvement in the quality of the CP ensemble was found when enlarging the CP domain size from the 13 • x12 • (lon-lat) to the 25 • x16 • , referred to as CP1-4(12) and CP2-4(12), respectively. The major difference between these two domains are the domain size and the position of the western boundary, which is located over the eastern Pacific Ocean, i.e. shifted to the west in the CP2 domain compared with the CP1 domain. This improvement is apparent in the simulated precipitation and in the synoptic circulation triggering the extreme events. This result was expected given the relevant role of the Andes and the Sierras de Cordoba topographic features in the initiation of convection, given the environmental favorable conditions that fuels the subsequent eastern progression and upscale of the systems. Hence, it is recommended that convection-permitting simulation for SESA should be performed over a domain that extends westwards at least over the eastern Pacific Ocean to include the topographic forcing which triggers convection and modulates the synoptic scale circulation.
Concerning the nesting strategy, a two-step nesting approach was compared with a one-step nesting approach in which the CP simulations were driven directly by the reanalysis. Additionally, the one-step nesting was implemented for two domain sizes, extending over a 25 • x16 • lon-lat box (CP2 domain) and over a larger area ( 29 • x21 • ; CP3 domain). In the latter, the northern boundary is shifted further to the north with the aim of including the area where the low-level jet reaches its maximum intensity within the CP domain. Note also that this larger domain is also enlarged towards the east, to better capture the incoming flow from the northeastern and from the Atlantic Ocean which increases the moisture flux towards SESA. The one-step nesting simulations, for both the CP2 and CP3 domains, showed some improvements against the rest of the CP simulations in terms of the minimum skillful spatial scales, suggesting that these simulations outperform in the representation of localized heavy rainfall all along the temporal evolution of the events. Additionally, these simulations reproduced the synoptic forcings associated with the events similarly compared with the twonesting simulation over the CP2 domain. It was also noted that the spread among ensemble members turned out to be slightly smaller for the one-nesting approach compared with the two-nesting approach, as expected, particularly evident after the decay of the extreme event. This occurs also for the larger CP domain. This feature is particularly important when considering CP simulations covering longer periods. This behavior is worth being explored in a separate study.
Focusing specifically on the one-step nesting approach, it is found that enlarging the CP domain has some benefits and some drawbacks. The larger CP domain allows the synoptic drivers of the extreme events being better reproduced, given the importance of the topography in modulating the low-level circulation. On the other hand, the computational cost of the simulations also increases, and this may limit the technical capability for performing longer term climatic simulations. It is worth highlighting that the computational cost of the one-step nesting CP2-4/noDR ensemble resulted 30% less expensive compared with the two-step nesting CP2-4/DR2-12. Additionally, the one-step nesting CP3-4/ noDR ensemble yielded a computational cost similar to that of two-step nesting CP2-4/DR2-12 ensemble, but with a considerably larger CP domain. Moreover, it is demonstrated that for the three individual events evaluated in this work, the one-step nesting CP simulations do not deteriorate the quality of the simulated fields compared with the twostep nesting, independently of the size of the CP domain. This behavior may have important consequences for longer, climatic scale CP simulations. A major drawback in the onestep nesting approach is the resolution jump between the model providing the LBC and the CPM. Matte et al. (2017) suggested that the larger the resolution jump the larger the area close to the boundaries that should be discarded due to the spatial spin-up. Moreover, given that multiple nesting increases the biases that are transferred from one domain to another, the one-step nesting approach may prevent this major shortcoming.
Given the multiple evidences of the advantages and limitations of the various CP configurations discussed in this study, we consider the one-step nesting approach operating at 4 km over the D3 domain as the recommended CPM configuration for SESA. We are aware, however, that the simulations evaluated here last 72-h and hence are strongly controlled by the initial conditions. Though the behavior of longer-term simulations covering various months or years cannot be directly extrapolated from this exercise, the results described in this work can be considered as a guidance for planning long-term climatic simulations over SESA with CPM ensembles.
Finally, note that the analysis of the 3-hourly precipitation during the extreme events have been performed after interpolating to a common grid, particularly upscaling to a lower resolution grid. The interpolation procedure may deserve special attention given the sensitivity that the precipitation fields, particularly for heavy events, may have to the interpolation method (e.g. Herrera García et al. 2016). This may be considered in further studies. Another important issue is related with the reanalysis driving the simulations. Given that ERA5 has been demonstrated to outperform the ERA-Interim reanalysis and given the importance of the synoptic scale forcing in triggering extreme precipitation events in the region, further studies should consider using ERA5 to drive the convection permitting model.