Retrospective sub-seasonal forecasts of extreme precipitation events in the Arabian Peninsula using convective-permitting modeling

This work demonstrates the potential of extreme cool-season precipitation forecasts in the Arabian Peninsula (AP) at sub-seasonal time scales, identifies regions and periods of forecast opportunity, and investigates the predictability of synoptic-scale forcing at sub-seasonal time scales. To this end, we simulate 18 extreme precipitation events using the convective-permitting Weather Research and Forecasting (CP-WRF) model with lateral boundary forcing from the European Centre of Medium-range Weather Forecasts sub-seasonal to seasonal reforecasts (ECMWF S2S reforecasts). The simulations are initiated at one-, two-, and three-week lead times. At all lead times, the CP-WRF improves the mean accumulated precipitation in the extratropical synoptic regimes over the west coastal and central AP and the central Red Sea compared to ECMWF S2S reforecasts as evaluated against the Global Precipitation Measurement Final (GPMF) and King Abdullah University of Science and Technology reanalysis (KAUST-RA) precipitation products. Based on categorical statistics with a threshold of 20 mm accumulated precipitation over 7 days, the CP-WRF skillfully forecasts the precipitation over Jeddah, the west coast of AP, and the central Red Sea up to three-week lead time. The relative operating characteristic curve reconfirmed the high forecast skill of the CP-WRF, with an area under the curve above 0.5 in most of the events at all lead times. Finally, the correlation coefficients between the ECMWF S2S reforecast and ECMWF reanalysis interim 500 hPa geopotential heights are higher in the events associated with an extratropical synoptic regime than in those associated with a tropical synoptic regime, regardless of lead time. Therefore, the convective-permitting model can potentially improve the accuracy of extreme winter precipitation forecasts at two-and three-week lead times over Jeddah, the west coast of AP, and the central Red Sea in the extratropical synoptic regime.


Introduction
More reliable forecast systems are needed for early warning of extreme weather events (Vitart and Robertson 2018) such as heavy precipitation, flash flooding, and heat waves (e.g., Liang and Lin 2018;Vuillaume et al. 2018;Grimm et al. 2021). The socioeconomic impacts of such events have become more profound in today's changing climate (e.g., Smith and Katz 2013). We are interested in extreme precipitation events that cause flash flooding on the Arabian Peninsula (AP), one of the driest and most water-limited environments in the world (Köppen 1936;Peel et al. 2007;Chen and Chen 2013). Flash flooding occurs when heavy precipitation falls on dry soils with low soil-infiltration capacity (Al Saud 2010; Almazroui 2011), filling ephemeral stream channels (called wadis in the Middle East) within a matter of minutes (Haggag and El-Badry 2013;Deng et al. 2015). In recent decades, the AP has experienced rapid socioeconomic development, expansion of urbanization and agricultural activities, and high population growth (Luong et al. 2020b;Hoteit et al. 2021). Episodic extreme precipitation events and flash flooding have increased concurrently, causing hundreds of casualties and infrastructural damages costing more than one billion U.S. dollars (Alkhalaf and Basset 2013;Haggag and El-Badry 2013;de Vries et al. 2016).
Climatologically, most precipitation in the AP occurs during the cool season from November to April. On average, the southwestern part of the AP receives approximately 300 mm of rainfall (60% of the annual precipitation) from October to May, while the rest of the region receives very small amounts of precipitation (Almazroui 2011). During the warm season from May to October, the precipitation at the southwestern tip of the AP is principally generated by southwesterly Indian monsoon winds and orographic lifting (Atlas 1984;Almazroui 2011;Viswanadhapalli et al. 2016). The present study focuses on extreme precipitation events over the AP during the cool season, which are often associated with mesoscale convective systems (MCSs) de Vries et al. 2018). In general, MCSs occurring over land produce large areas of heavy precipitation (covering 100 km or more) and strong winds (e.g., Houze 1993;Schumacher and Rasmussen 2020). The driving cool-season synoptic patterns that favor MCS occurrence over the AP are detailed in Sect. 2.
Our objective is to improve the accuracy of retrospective sub-seasonal forecasts of extreme precipitation in the AP. The time period of sub-seasonal forecasts falls between those of medium-range numerical weather predictions (one to two weeks) and seasonal forecasts (three to six months) Robertson et al. 2020). Therefore, the sub-seasonal forecast window lies between two weeks and less than two months . Within this window, the three-and four-week lead time forecasts are most important. Reliable sub-seasonal forecasts are needed for planning and management decisions in multiple sectors, including agriculture, energy, water resources, and emergency preparedness (e.g., Mariotti et al. 2020;Vitart and Robertson 2015;Vitart et al. 2017;Vitart and Robertson 2018). Particularly on the AP, improved sub-seasonal forecasts of extreme precipitation events would mitigate physical damage and loss of life from flash flooding, conferring significant socioeconomic benefits. Forecasting within this time scale has been challenging (e.g., Collier and Zhang 2007;Becker 2017), because the forecast lead time is sufficiently long to lose much of the memory of the atmospheric initial conditions, yet too short to be influenced by variability of the ocean boundary conditions Liang and Lin 2018).
Sub-seasonal predictability has been well evaluated in the context of existing forecast systems throughout the world with relatively coarse spatial resolutions (> 0.5°), as compared to convective-permitting short-term numerical weather prediction forecasts. Within Asia, these modeling systems have shown value in the sub-seasonal prediction of summer monsoon precipitation (Jie et al. 2017), extreme precipitation in Sri Lanka (Vuillaume et al. 2018), and temperature and precipitation in East Asia (Liang and Lin 2018), especially during periods of 'forecast opportunity' related to modes of climate variability (Mariotti et al. 2020;Merryfield et al. 2020). These modes include the boreal summer intraseasonal oscillation, the Madden Julian Oscillation, and the El Niño Southern Oscillation.
The present study investigates whether there is value added by convective-permitting modeling at the sub-seasonal forecast timescale in a target region where the forecast is realized, in this case the AP. The grid spacing of convective-permitting models (CPMs) is nominally less than 4 km and so that deep convection is explicitly represented without a deep convective parameterization scheme (e.g., Prein et al. 2015). CPMs have been applied in weather forecasting (e.g., Clark et al. 2016) and climate projection (e.g., Prein et al. 2015;Kendon et al. 2017;Lind et al 2020;Lucas-Picher et al. 2021;Ban et al. 2021) resulting in considerable improvements in quantitative precipitation forecasts. Included in the CPM literature are some studies simulating extreme weather events over the AP (e.g., Deng et al. 2015;Dasari et al. 2017;Luong et al. 2020b) and incorporation of data assimilation (Viswanadhapalli et al. 2016) to improve short-range forecasts, which inform the methodological approaches in this work. With respect to convective precipitation, CPMs add substantial value in determining the amount, timing, and storm structure of convective precipitation (e.g., Kendon et al. 2012;Ban et al. 2014;Coppola et al. 2020;Chan et al. 2020;Kouadio et al. 2020). However, the use of CPM still presents some unresolved challenges. Besides requiring high computing power, some studies have found that CPMs still have problems with respect to convective propagation (e.g., Risanto et al. 2019;Hassim et al. 2016), small showers (e.g., Stratton et al. 2018), and excessively intense precipitation (e.g., Liu et al. 2017).
It is technically possible to dynamically downscale longterm reforecast products at S2S timescales to investigate potential value added in forecast skill. Such data are currently available from the S2S Project  and the Sub-seasonal Experiment (Sub-X) project (Pegion et al. 2019). The value added by dynamical downscaling of global models for retrospective seasonal forecasts has been investigated through coordinated community efforts. An example of seasonal forecasting on meso-β grid scales (tens of km) is the Multi-RCM Ensemble Downscaling of Multi-GCM Seasonal Forecasts (MRED) experiment (Shukla and Lettenmaier 2013;De Haan et al. 2015). MRED and similar works have shown that regional models can improve the forecast skill where there is already some pre-existing skill in the global model, add the greatest value during periods of 'forecast opportunity' when large-scale climate variability projects strongly on the regional precipitation, and better represent mesoscale meteorological processes (e.g., Shukla and Lettenmaier 2013;De Haan et al. 2015;Castro et al. 2012).
In this study, we apply a CPM reforecast for the top convective extremes in the AP at sub-seasonal time scales. The extreme events were identified based on precipitation amount recorded near the city of Jeddah. This work is important not only for improving the capability of extreme weather forecasting and enhancing the environmental sustainability of the AP (Hoteit et al. 2021), but for establishing a replicable methodological approach for sub-seasonal CPM in other parts of the world. The goals of this study are threefold: (1) to demonstrate the potential improvement of extreme cool-season precipitation forecasting of a CPM over the AP with one-, two-and three-week lead times, (2) to identify the regions and periods of forecast opportunity, and (3) to investigate the predictability of synoptic-scale forcing at these sub-seasonal time scales. The remainder of this manuscript is structured as follows. Section 2 discusses the synoptic patterns associated with extreme convective precipitation events over the AP during the cool season. Section 3 describes the data and methods of the analysis performed on 21 extreme precipitation events. Section 4 presents the results, emphasizing the value added by the CPM during periods of forecast opportunity. Section 5 and 6 conclude the work with a general discussion and a summary of the main conclusions, respectively.

Synoptic environments of extreme cool-season precipitation in Arabian Peninsula
Prior studies of convective precipitation events have described the favorable conditions for MCSs over the AP during the cool season from November to April (e.g., Alkhalaf and Basset 2013;Haggag and El-Badry 2013;de Vries et al. 2016de Vries et al. , 2018Luong et al. 2020a). On the synoptic scale, a mid-latitude middle-to-upper level (850-500 hPa) trough in the eastern Mediterranean interacts with a Red Sea trough (RST) near the surface. This interaction is influenced by a stationary lower-to-middle level Arabian anticyclone (AA) centered over the southeastern AP and the Arabian Sea. As the mid-latitude trough intrudes into the subtropics and propagates eastward, it amplifies the RST, deflecting the moisture flow from the Red Sea and the Arabian Sea toward the AP (de Vries et al. 2018). The interaction between the mid-latitude trough and the AA at 850 to 700 hPa creates a Red Sea convergence zone (RSCZ;Pedgley 1966;Langodan et al. 2017;Viswanadhapalli et al. 2017) at approximately 18°N. It also drives intense winds between the coastal mountain gaps along the African coast , mostly prominently the Tokar jet (Langodan et al. 2014;Davis et al. 2015). These cross-Red Sea winds gather the moisture from the sea surface and transport it eastward (Hoteit et al. 2021). The low-level moisture transport combined with orographic lift and diabatic heating over the mountain ranges in the southwestern AP increases the atmospheric instability, facilitating convective organization and extreme precipitation (de Vries et al. 2016;Dasari et al. 2017).
Recently, the dynamic and thermodynamic aspects of MCS-driven extreme precipitation over the AP have also been reported. De Vries et al. (2013) demonstrated that extreme precipitation events over the western AP are generated by an active RST (ARST). During the ARST, the RST extends northward and the AA intensifies, increasing the southerly advection of warm moist air and enhancing the tropospheric instability, especially over the southwestern AP. The large amount of moisture falls as heavy precipitation. The severity of this precipitation is closely related to the potential vorticity (PV) intrusion toward the subtropics and the integrated water vapor transport (IVT; de Vries et al. 2018). As the stratospheric PV intrudes further south and the IVT magnitude increases, the precipitation becomes more extreme.
Extreme cool-season precipitation events, especially around the city of Jeddah, are also related to synoptic-scale weather patterns over the AP. Luong et al. (2020a) classified these events into three synoptic-scale circulation regimes or clusters, namely, extratropical (Fig. 1a), transitional (Fig. 1b), and tropical ( Fig. 1c) regimes. The cluster classification was based on self-organizing map (SOM) analysis, which is an unsupervised machine learning technique used to produce a two-dimensional representation (map) from a higher dimensional data set while preserving the topological structure of the data (e.g., Hewitson and Crane 2002). The classification process places similar modes closer to each other following a smooth transition. Luong et al. (2020a) applied this technique to objectively group similar extreme precipitation events with common or similar patterns into the three distinct clusters.
The classification analysis used 18 daily-averaged atmospheric variables derived from the ECMWF reanalysis interim (ERA-Interim) and was conducted over the whole AP (10-50° N, 0-70° E) during November to April of each year from 1979 to 2018. The variables included zonal wind, meridional wind, and geopotential height at 850 hPa, 700 hPa, 500 hPa, and 200 hPa pressure levels, potential vorticity at the 330 K potential temperature level, vertically integrated eastward and northward water vapor fluxes, zonal and meridional wind at 10 m, and mean sea-level pressure. The dynamics and thermodynamics of the atmosphere (up to the tropopause) of each event are projected onto the 3 synoptic modes by simultaneously analyzing these 18 variables. Each event is reshaped to a one-dimensional vector (defined as the number of points in x direction × number of points in y direction × number of variables). After computing the Eulerian distances of each event vector and the 3 mode vectors, the event is projected onto the mode with the least distance. The large-scale circulations for extreme precipitation in the region are represented by the intensity, shape and position of the main synoptic features, i.e., Red Sea trough, Arabian anticyclone, upper-level trough, and moisture flux. Luong et al. (2020a) associated the most intense and organized precipitation events with the RST and AA combined with the equatorward deep intrusion of the upper-level trough (850 hPa) (Fig. 1a). The precipitation over Jeddah generally reduces when the synoptic pattern is strongly influenced by tropical air in the RST (Fig. 1c).
In addition, Dasari et al. (2018) suggested that synopticscale patterns are also influenced by the El Niño Southern Oscillation. During El Niño years, the RST extends farther north, shifting the RSCZ northward to around 20.5°N. This shift significantly increases the total winter precipitation in the northern Red Sea region. During La Niña years, the RST appears over the southern Red Sea, shifting the RSCZ southward to around 18.5°N.

WRF domain and configuration
Our retrospective sub-seasonal simulations applied the Advanced Research version of the Weather Research and Forecasting (WRF-ARW; Skamarock and Coauthors 2008;Powers et al. 2017) model version 3.8. Our model configuration implemented two two-way nested domains with horizontal resolutions of 20 km (d01) and 4 km (d02), each with 53 vertical levels in eta coordinates (see Fig. 2). As the inner domain has a convective-permitting horizontal grid spacing, it explicitly represents deep convection at the meso-γ scale (1-4 km) without a convective parameterization scheme. This convective-permitting configuration has been implemented for the AP region in previous studies. For example, Viswanadhapalli et al. (2016) adopted the Kain-Fritsch scheme, but our outer domain employed the Grell-Freitas cumulus parameterization scheme (Grell and Freitas 2013) following Gao et al. (2017) that demonstrated the Grell-Freitas scheme's lower sensitivity to the model resolution than that of Kain-Fritsch scheme. Other physical models applied to both domains are shown in Table 1. The forecasts are free running and do not apply any spectral nudging nor reinitialization in the outer and inner domains.
We note that other studies (e.g., Carrillo et al. 2017) applied the sea surface temperature (SST) bias correction in a dynamically-downscaled climate forecast reanalysis. They chose this correction because the SST influences the It shows the synoptic-scale circulation of extratropical (a), transitional (b), and tropical (c) regimes. Black, red, blue, and green contours represent the mean sea level pressure, geopotential heights at 850 hPa, geopotential heights at 500 hPa, and vertical integral water vapor flux, respectively. Extreme events are associated with the Red Sea trough (RST), Arabian anticyclone (AA), upper-level trough (ULT), and moisture flux (MF). The green arrows represent wind speed and direction at 10 m. The yellow star indicates the city of Jeddah  Grell-Freitas (Grell and Freitas 2013) convective organization. As our study investigates the potential of sub-seasonal forecasts, we do not apply an SST bias correction. We used SST data available in the boundary forcing, to be described in the following subsection.

ECMWF S2S reforecast datasets
As the lateral boundary conditions and initial conditions of the simulations, we used the licensed version of ECMWF S2S reforecast datasets ) with 12-hourly temporal resolution. The datasets (hereafter referred to as raw ECMWF) include surface variables: skin temperature, sea surface temperature (SST), U wind at 10 m, V wind at 10 m, sea level pressure, surface pressure, relative humidity (RH), temperature, and dewpoint temperature at 2 m, sea ice, snow depth and density. The 3D variables include U and V winds, specific humidity, temperature, height at 11 pressure levels (i.e., 1000, 925, 850, 700, 500, 400, 300, 200, 100, 50, 10 hPa), and 4-layer soil variables of moisture and temperature. We also used the SST from the ECMWF S2S reforecast dataset updated every 12 h. The datasets, which are part of The Observing system Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) database (up to 15 days, medium range forecast) and the Climate-System Historical Forecast project (CHFP, seasonal forecast), range from 0 to 46 days with a spatial resolution of 0.2° × 0.2° (T639) on days 0 to 15 and 0.4° × 0.4° (T319) after day 15 (https:// www. ecmwf. int/ en/ forec asts/ datas ets/ set-vi# VI-va). There are 91 vertical levels. The reforecast datasets are generated twice weekly (Mondays and Thursdays) with 11 ensemble members in each initialization. Using the actual forecast model, the reforecasts are run for the past several years on the same calendar day as the forecast. These reforecasts are used for calibrating the actual forecasts . The reforecast years of the datasets now cover the past 20 years. The S2S reforecast database can be accessed online at http:// apps. ecmwf. int/ datas ets/ data/ s2s; http:// s2s. ecmwf. int.

Simulations and classification of extreme events over the AP
Among the winter convective events from 1999 to 2018, we selected the top 21 extreme precipitation events with recorded rain gauge measurements over 20 mm day −1 near Jeddah (see Table 2). All events were verified against the spatial precipitation distribution obtained from the Global Precipitation Measurement daily precipitation product (GPM_3IMERGDF; Huffman and Coauthors 2018). The WRF S2S simulations of the individual convective extreme events were initialized at approximate lead times of three weeks (W3), two weeks (W2), and one week (W1) using all 11 ensemble members. The exact initialization date was , and W1 simulations were integrated over 21, 14, and 7 days respectively and outputs were generated every three hours. All simulations were ended several days after the event, giving a ± 3 day accumulated precipitation centered on the day of the event for statistical verification purposes of W2 and W3. For the AP, climatology analyses are often constrained by data availability. The synoptic patterns are found to be a good indicator for early detection of convective precipitation (e.g., Crawford et al. 2020;Dasari et al. 2018;Nguyen-Le et al. 2017;Merino et al. 2016). The AP synoptic pattern analysis that was resulted from the SOM technique applied by Luong et al. (2020a) demonstrated that the trend of precipitation extremes in the AP is mainly influenced by synoptic conditions. From 1979 to 1998, extreme events in the AP were more associated with tropical synoptic influence, and from 1999 to 2018, the extratropical synoptic influence is more dominant. Since one of the main objectives of our study is to identify the window of opportunity to forecast convective extremes at sub-seasonal time scales in the AP, we adopted the synoptic classification of Luong et al. (2020a) (hereafter referred to as Luong-SOM analysis). By analyzing the forecasts based on the classification, we would gain more insights into the aspect of forecast opportunity. We found that ten and eight events were classified as dominantly influenced by extratropical and tropical modes, respectively, and three events were transitional. Table 2 shows the classification of each event. As there are few transitional precipitation events, they were excluded from our analysis and discussion.

Ground reference
To evaluate the predictability of the precipitation forecasts in each synoptic classification, we required a precipitation ground reference. For this purpose, we used the precipitation products from the Global Precipitation Measurement Final Precipitation L3 v06 (GPM_3IMERGHHV06; hereafter referred to as GPMF; Huffman et al. 2018) and the King Abdullah University Science and-Reanalysis (hereafter referred to as KAUST-RA; Dasari et al. 2019). In observation-limited regions such as the AP, the observation datasets require sufficient spatial and temporal availability and must reasonably represent the ground-based measurements. Being a reanalysis based on the regional model described in Viswanadhapalli et al. (2017), KAUST-RA is a viable alternative option for regions lacking a sub-daily precipitation product based on rain gauges and/or radar-derived precipitation. The AP region fits this category. Note that the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42 v7 precipitation product (hereafter referred to as TRMM; Huffman et al. 2007) is used for the 8 January 1999 event since it is not covered by the GPMF datasets.
Other selection criteria for the ground reference were the bias and root mean square error (RMSE) of the multiple-satellite products and the KAUST-RA relative to rain gauge measurements. The satellite products included the GPMF, the National Oceanic and Atmospheric Administration Climate Prediction Center Morphing technique (CMORPH; Joyce et al. 2004), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN; Sorooshian et al. 2002), and the Global Satellite Mapping of Precipitation (GSMaP; Kubota et al. 2007). Details of these satellite products are given in Table 3. The biases and RMSEs were computed relative to the daily precipitation dataset collected from weather stations across Saudi Arabia by the Presidency of Meteorology and Environment between January of 2000 and December of 2012. From the stations with over 95% completion of their precipitation data, we extracted the cool-season precipitation events that accumulated 5 mm or more precipitation. Only 43 out of 208 stations met the data-completion threshold, and among the 12 cool seasons, 100 precipitation days met the 5 mm accumulated precipitation threshold. We computed the bias and RMSE at each station and averaged them across the 43 stations. The GPMF precipitation product, with the lowest bias and the second lowest RMSE (Fig. 3), was selected along with the KAUST-RA precipitation products as the ground reference

Analysis methods
As the model simulations were evaluated at every grid point in the domain, the spatial resolution of the modeled precipitation field was up-scaled (or interpolated) from 4 to 5 km (the resolution of KAUST-RA) and 0.1° (the resolution of GPMF) using the Earth System Modeling Framework "conserve" function within the National Center for Atmospheric Research (NCAR) Command Language (NCL). The raw ECMWF reforecast and the TRMM datasets were also interpolated to the resolutions of GPMF and KAUST-RA using the same function. The following analyses use both re-gridded precipitation datasets.

Average accumulated precipitation
To evaluate how effectively the convective-permitting-WRF (CP-WRF) predicts the convective extremes over the AP at sub-seasonal time scales, we first examined the accumulated precipitation. Here we use the precipitation with grid spacing interpolated to the GPMF's spatial resolution. The model-generated precipitation was evaluated at ± 1 days of the event (three days of accumulated precipitation) on weather forecasting time scale W1, and at ± 3 days of the event (seven days of accumulated precipitation) on W2 and W3. The ensemble mean of each simulated event and the average precipitation during events associated with the extratropical and tropical regimes were then calculated. This procedure was repeated for the raw ECMWF precipitation. The precipitation forecasts of the extratropical and tropical events were compared by subtracting the average precipitation of the raw ECMWF from that of the CP-WRF. The CP-WRF average precipitation biases relative to the GPMF and KAUST-RA precipitations were also calculated. The differences and biases indicate the quality of the CP-WRF relative to the coarser-resolution raw ECMWF and observations.

Categorical statistics
The precipitation forecast skill was measured with three categorical statistics: the critical success index (CSI), probability of detection (POD), and false alarm ratio (FAR), as described in Wilks (2011). Here we use the precipitation with grid spacing interpolated to the GPMF's and the KAUST-RA's spatial resolutions. These verification metrics are intuitive because they are based on a 2 × 2 forecast contingency table with four entries: hits, misses, false alarms, and correct negatives. The CSI, POD, and FAR verification metric are respectively calculated as We defined a precipitation event at a grid point with 20 mm of accumulated precipitation in three days for W1 forecasts and seven days for W2 and W3 forecasts. Considering the location uncertainty at these sub-seasonal time scales, we applied a neighborhood verification technique that considers the modeled events in ± 4 grid points (81 grid points in total), covering approximately 90 × 90 km in the GPMF's spatial resolution scale (approximately twice the non-interpolated raw ECMWF grid spacing) and around 45 × 45 km in the KAUST-RA's spatial resolution scale. The precipitation verification in the CPM simulations was then approximately within the effective resolution of the raw ECMWF data, alleviating the double-penalty problem of missing precipitation in correct locations and getting precipitation in wrong locations. A precipitation event is included in the contingency table if it meets the precipitation threshold and falls into one of the 81 grid points. This verification technique has been used in previous CPM studies (e.g., Risanto et al. 2021;Moker et al. 2018). The CSI, POD, and FAR values range from 0 to 1. As the CSI and POD values approach 1 (0), the precipitation forecast skills increase (decrease). In contrast, as the FAR values approach 1 (0), the precipitation forecast skills decrease (increase). Here we calculated 1-FAR to give consistent interpretations of the CSI, POD, and FAR.
To assess the modeled precipitation forecast skill, we subtracted the CSI, POD, and FAR values of each grid point in the raw ECMWF precipitation from those in the CP-WRF precipitation. At grid points with positive values, the CP-WRF demonstrated higher precipitation forecasting skill than the raw ECMWF. The statistical field significance was also computed. At each grid point, a two-tailed local significance test was established via a Monte Carlo resampling method with 1000 random permutations ( p < 0.1 ). The statistical field significance was computed with 1000 random resampling of the maps, as described in Livezey and Chen (1983). The critical value of the statistical field significance (1) CSI = hits hits + misses + false alarms , (2) POD = hits hits + misses , (3) FAR = false alarms hits + false alarms .
was set to 90%, the 900th value of the histogram. That is, only grid points containing at least 900 unique values were used. We emphasize that the methods in all of these analyses were previously demonstrated in convective-permitting NWP simulations in northwest Mexico (Moker et al. 2018;Risanto et al. 2021).
We also computed the frequency distribution of the domain-wide POD as a percentage of total number of grid points in the domains for the extratropical and tropical events at W1, W2, and W3. To calculate the statistical significance of the distribution, we generated 1000 random permutations of the POD distributions of the raw ECMWF and CP-WRF using the Monte Carlo technique. The statistical significance was set to 90%. The purpose is to show the forecast skill of the CP-WRF and its statistical significance.

Relative operating characteristic
In addition to the previously mentioned categorical indices (CSI, POD and FAR), we constructed the relative (or receiver) operating characteristic (ROC; Mason and Graham 2002) for determining the probability of precipitation forecasts at the sub-seasonal time scale. Here we use the precipitation with grid spacing interpolated to the GPMF's and KAUST-RA's spatial resolutions. Like the other standard probabilistic verification metrics, the ROC curve is a common verification index of operational forecasts and is suitable for ensemble forecast systems (Coelho et al. 2019). The methodology was first developed to classify the accuracy of radar signals during World War II, and its use has expanded to clinical diagnostic testing and screening (Zou et al. 2007). In forecast verification, the ROC curve classifies the forecasts into hits or misses based on the observations at a pre-determined threshold and is often plotted as POD versus FAR in Cartesian coordinates. Being conditioned by the observations, the ROC curve in our case can be analyzed only during convective events. When the curve starts from the bottom left corner, ends at the top right corner, and bends toward the top left corner, the skill of the forecasting system is high. When the curve is below the diagonal line joining the bottom left to the top right corner (the ROC curve of a random classifier), the forecasting system has no forecasting skill. The forecast probability is quantified by the area under the curve (AUC), which ranges from 0 (no forecasting skill) to 1 (perfect forecast ability). Any AUC below 0.5 means that the classifier performance is worse than random.
ROC curves are also suitable for forecasting evaluations on sub-seasonal timescales. ROC curve analyses were performed by Hudson et al. (2011), who evaluated the potential of sub-seasonal forecasting of extreme temperatures in Australia in a T47 (3.7° × 3.7°) horizontal resolution model, and by Coelho et al. (2018), who validated sub-seasonal precipitation forecasts in South America, which implemented both the real-time forecast and reforecast ECMWF. Here, we computed the ROC curves of both the CP-WRF and raw ECMWF precipitation events. Similarly to the aforementioned categorical statistics, we calculated the hit and false alarm rates with ± 4-grid point neighborhood verification and a 20 mm accumulated precipitation in 3 days for W1 and 7 days for W2 and W3 for each ensemble member. The ROC curve of each event was obtained by averaging the values across the ensemble members.
Following previous studies (e.g., Hafez and Almazroui, 2016;Christidis and Stott, 2015), which reported a relationship between geopotential heights and extreme weather and climate, we evaluated the raw ECMWF 500 and 850 hPa geopotential heights relative to those of ERA-Interim using the Pearson's correlation coefficient. Here, we applied the synoptic classification of Luong et al. (2020a). The spatial resolution of the ERA-Interim dataset is approximately 80 km on 60 vertical levels from the surface to 0.1 hPa (Berrisford et al. 2011). This evaluation quantifies the predictability of the sub-seasonal forecast based on the synopticscale patterns. At each grid point in the domain, the raw ECMWF geopotential height daily mean of the events was correlated with that of ERA-Interim. As the raw ECMWF reforecast is an 11-member ensemble dataset, the correlation relative to ERA-Interim was calculated for every ensemble member and averaged across the 11 ensemble members. The raw ECMWF geopotential height was qualitatively evaluated by plotting its daily mean and the synoptic-scale pattern determined in the Luong-SOM analysis.

Precipitation difference and biases
When simulating the CP-WRF events, the average total precipitation in all synoptic-related events exceeded the raw ECMWF precipitation. The precipitation difference between the two is calculated using their grid spacing interpolated to the GPMF's spatial resolution. Among the extratropical events (Fig. 4), the total precipitation notably increased at lead time W1 over the west coast of the AP, just north of Jeddah and the Zagros Mountains east of the Persian Gulf (also known as the Arabian Gulf) (see the difference plot in Fig. 4g). The precipitation was notably increased over the central AP where the raw ECMWF generates drier conditions. At W2, the CP-WRF precipitation was again higher over Jeddah (as indicated by the star in Fig. 4), the central Red Sea, and the Zagros Mountains than the raw ECMWF precipitation (Fig. 4b and e) as shown in the difference plots (Fig. 4h). However, the CP-WRF produced drier conditions over the central AP than the raw ECMWF. Similarly to W2, the CP-WRF at W3 obtained higher total precipitation over the central Red Sea, near Jeddah, and the Zagros Mountains than the raw ECMWF (Fig. 4c, f, and i). Over the Red Sea, where the raw ECMWF barely generated precipitation at W3, the precipitation increase was notable.
Among the tropical events (Fig. 5), the average total precipitation of the CP-WRF at W1 was also higher than the raw ECMWF precipitation, especially over the central AP (see difference plot in Fig. 5g). The CP-WRF produced two separate precipitation maxima: one southwest of Jeddah, the other southeast of Jeddah. The city of Jeddah was drier in the CP-WRF model than in the raw ECMWF. At W2 in the raw ECMWF, the precipitation exceeded 8 mm over a large area of the central Red Sea (Fig. 5e), whereas in the CP-WRF, the precipitation was concentrated over the high terrain southeast of Jeddah to the central AP and the east coast of Sudan, exceeding 18 mm precipitation in total (Fig. 5b). In contrast, the central Red Sea was almost dry. Similar patterns were found at W3, but the precipitation signatures were weaker than at W2 (Fig. 5c). These high average precipitations at W1, W2, and W3 in the CP-WRF were likely attributable to convective organization at the meso-γ scale, which could be resolved in this model but not in the raw ECMWF, particularly over the complex terrain southeast of Jeddah. Figure 6 shows the average accumulated precipitations of GPMF and KAUST-RA and the precipitation biases of the CP-WRF at W1, W2, and W3 in the extratropical events. The GPMF clearly gave more precipitation (> 20 mm) in the central AP than the KAUST-RA (Fig. 6a-c, g-i). The CP-WRF precipitation exhibited a dry bias (indicated by the blue shading) relative to the GPMF precipitation at W1, W2, and W3 (Fig. 6d-f) in the north of Jeddah, the central AP, and the Zagros Mountains. Conversely, it showed a wet bias (< 15 mm) (indicated by the red shading) in the central Red Sea at W2 and W3. Compared to the KAUST-RA precipitation, the CP-WRF consistently exhibited a dry bias over Jeddah and the Zagros Mountains at all lead times (Fig. 6j-l). Figure 7 shows the observed precipitation and the CP-WRF biases during the tropical events. The KAUST-RA generated more precipitation (> 18 mm) than the GPMF, especially over the central AP (Fig. 7a-c, g-i). Compared to the GPMF, the CP-WRF exhibited a dry bias over Jeddah and the central AP that increased from W1 to W3 (Fig. 7d-f). Wet biases were notable over the east coast of Sudan and the southwestern tip of the AP (Fig. 7d-f). The dry bias of the CP-WRF was more pronounced relative to the KAUST-RA precipitation than relative to the GPMF, especially over the central AP (Fig. 7k, l). This result was expected, because the KAUST-RA generated more precipitation than the GPMF.
In summary, the CP-WRF generated more precipitation in the extratropical events than the raw ECMWF at W1, W2, and W3. This is shown in the differences, which were dominated by positive values over the central Red Sea, the central AP, and the Zagros Mountains. In the tropical events, the CP-WRF generated notable precipitation only in the southwest of Jeddah, the east coast of Sudan, and the central AP at W1. The raw ECMWF generated more precipitation over the central Red Sea at W2 and W3 than the CP-WRF. In general, the CP-WRF precipitation was lower than the GPMF and KAUST-RA precipitation, but higher than the raw ECMWF precipitation. Figure 8 displays the differences between the CSI, POD, and FAR of the CP-WRF and raw ECMWF precipitations relative to the GPMF precipitation for the extratropical events. Consistent with the average CP-WRF precipitation, the precipitation forecast skills were consistently higher in the CP-WRF than in the raw ECMWF when domain-averaging was applied, irrespective of the forecast timescale. The CSI and POD differences were also field-significant (> 90%) from W1 to W3. At W1, the precipitation forecast skill was high (blue shading) over the entire Red Sea, the central AP, and the Zagros Mountains. At W2 and W3, the forecast skill was notably high over the central Red Sea including Jeddah and areas along the west coast of the AP, and also over the Zagros Mountains. The FAR results (here plotted as 1-FAR) were consistent with the CSI and POD results but were not field-significant. Figure 9 displays the differences between the three verification metrics of the CP-WRF and raw ECMWF precipitations relative to the GPMF for the tropical events. These results confirm the average CP-WRF precipitation during the tropical events in the previous subsection. At W1, the CP-WRF better forecasted the precipitation than the raw ECMWF (blue shading) over the central AP and a small Fig. 4 Average total precipitations in the extratropical events obtained by CP-WRF (a-c) and the raw ECMWF (d-f), and their difference (CP-WRF minus raw ECMWF) (g-i). At the 1 week lead time (W1), the precipitation was accumulated for 3 days centered on the day of the event. At the 2 and 3 week lead times (W2 and W3 respectively), the precipitation was accumulated for 7 days centered on the day of the events. The star indicates the location of Jeddah part of the Red Sea, but underperformed around Jeddah and the central Red Sea (red shading). At W2, the forecast skill of the CP-WRF was higher only over the areas east of Jeddah, the central and northern AP, and the east coast of Sudan. Over this area, the raw ECMWF generally outperformed the CP-WRF, possibly because the CP-WRF inadequately resolves the precipitation over the central Red Sea. As mentioned earlier, the persistent SST might reduce the forecasting ability of the CP-WRF. Finally, at W3, the CP-WRF better forecasted the precipitation than the raw ECMWF in the central Red Sea, Jeddah, and the Zagros Mountains.

Precipitation categorical statistics
Similar results were obtained using KAUST-RA as the ground reference. Figure 10 shows the CSI, POD, and FAR differences between the CP-WRF and raw ECMWF for the extratropical events. The CSI and POD differences were field-significant. At W1, the CP-WRF outperformed the raw ECMWF in the Red Sea, the central AP including the Jeddah area, and the Zagros Mountains. At W2 and W3, the enhanced precipitation forecast skill of the CP-WRF was notable in the central Red Sea, Jeddah, the west coast of AP, and the Zagros Mountains.
For the tropical events (see Fig. 11), the difference plots of the CP-WRF and raw ECMWF again indicate a higher precipitation forecast skill of the CP-WRF than of ECMWF, and field significance of the CSI and POD results. At W1, the CP-WRF outperformed the ECMWF over the area east of Jeddah and the central AP, and northwest of the Persian Gulf. At W2, the enhanced forecast skill was notable over east of Jeddah, some parts of the central and northern AP, and along the east coast of Sudan. In contrast, the forecast skill of CP-WRF was reduced in the central Red Sea. At W3, the CP-WRF outperformed the ECMWF in the central Red Sea and the west coast of AP, including Jeddah. Figures 12 and 13 show the frequency distributions of the POD relative to the GPMF and the KAUST-RA for all week lead times in the extratropical and tropical events, respectively. In the extratropical events and at all lead times, PODs above 0.8 appeared more frequently in the CP-WRF than in the raw ECMWF regardless of the precipitation products being used in the analyses. At W2, the differences between CP-WRF and raw ECMWF were statistically significant (marked by yellow stars) only for PODs greater than 0.9, but at W1 and W3, they were statistically significant for PODs greater than 0.8. Among the tropical events (Fig. 13), PODs greater than 0.7 appeared more frequently in the CP-WRF than in the raw ECMWF at W1 regardless of the precipitation products being used in the analyses. They were also statistically significant (marked by yellow stars). At W2 and W3, the PODs greater than 0.9 appeared slightly higher for Fig. 6 Mean accumulated precipitations of GPMF (a-c), CP-WRF precipitation bias relative to GPMF precipitation (d-f), mean accumulated precipitation total of KAUST-RA (g-i), and CP-WRF precipitation bias relative to KAUST-RA precipitation (j-l) for the extratropical events. At 1 week lead time (W1), the precipitation was accumulated for 3 days centered on the day of the event. At 2 and 3 -week lead times (W2 and W3 respectively), the precipitation was accumulated for 7 days centered on the day of the events. Note that panel (b) is the same as panel (c), and panel (h) is the same as panel (i). The star indicates the location of Jeddah the CP-WRF than the raw ECMWF, but they were not statistically significant. The PODs between 0.8 and 0.9 were higher in the raw ECMWF than in the CP-WRF for both analyses against the GPMF and KAUST-RA products. These seemed to reflect the raw ECMWF's high forecast skill over the central Red Sea as shown in Figs. 9 and 11.
In summary, the CSI, POD, and FAR difference results were very similar for two different data sources of observed precipitation. Therefore, the value added by the CP-WRF is not a function of any particular observational precipitation product. The CP-WRF can better predict both extratropical and tropical precipitation events than the driving raw ECMWF model, up to three weeks ahead of the events. More importantly, the CP-WRF demonstrated higher forecast skill than the raw ECMWF around the area of Jeddah, the central Red Sea, and the high terrain in the west coast of AP, suggesting that the CP-WRF can improve the forecast of extratropical-driven extreme precipitation events. Moreover, these extreme precipitation events occurred at sub-seasonal time scales at the original raw ECMWF resolution, but the

ROC curves and AUC of precipitation
A 20 mm precipitation threshold was applied to each event in the ROC analysis. Figure 14 shows the ROC curves of the individual extratropical event relative to the GPMF precipitation for W1, W2, and W3. At W1, the raw ECMWF exhibited a high forecast skill in most of the events but was outperformed by CP-WRF in five events (with AUC values > 0.8). The higher forecast skill of the CP-WRF is consistent with the verification metrics analyzed in the previous subsection. At W2, the AUCs of the CP-WRF exceeded 0.5 for nearly all events, again outperforming the raw ECMWF.
Most importantly, the AUCs of more than 90% of the events at W3 were higher in the CP-WRF than in the raw ECMWF. Most of the AUCs in CP-WRF were slightly above 0.5. The AUCs below 0.5 in the CP-WRF (two events) were higher than their counterpart AUCs in the raw ECMWF. Figure 15 shows the ROC curves of the individual tropical events for all lead times. Similarly to the extratropical events, the tropical events were well forecasted by the CP-WRF. At W1, the AUCs of all events were higher in CP-WRF than in the raw ECMWF, although the AUC of one ROC was lower than 0.5 in the CP-WRF model. At W2, the AUCs of the ROC curves of all events exceeded 0.5 in the CP-WRF, but were below 0.5 in the raw ECMWF. At W3, the CP-WRF and raw ECMWF again yielded notably different ROC curves. The AUCs of the ROCs exceeded 0.5 for Fig. 8 Differences in the CSI, POD, and FAR metrics (relative to the GPMF precipitation) between the CP-WRF and raw ECMWF total precipitations driven by extratropical events at W1, W2, and W3. The total precipitation threshold was set to 20 mm. The blue (red) shading indicates high (low) forecast skill of the CF-WRF. The field-signifi-cance values displayed in the lower left of each panel were computed using the Monte Carlo technique with 1000 random permutations. The contour lines are the 500 hPa geopotential heights (in one-tenth of a kilometer) of the extratropical events based on the Luong-SOM analysis. The star indicates the location of Jeddah almost all events in the CP-WRF (for the two exceptions, the AUC was ~ 0.4). In contrasts, all events in the raw ECMWF yielded AUC values around 0.4. Figures 16 and 17 show the ROC curves of the individual extratropical and tropical events, respectively, relative to the KAUST-RA precipitation for all lead times. Again, the precipitation prediction accuracy was higher in CP-WRF (AUC > 0.5) than in raw ECMWF at W1, W2, and W3. These results are consistent with the ROC curve analysis relative to the GPMF precipitation.
The average differences between the AUCs of the CP-WRF and raw ECMWF analyses relative to the GPMF and KAUST-RA precipitation data are shown in Tables 4 and 5, respectively. All differences are positive, confirming the higher precipitation forecast skill of CP-WRF than of raw ECMWF at W1, W2, and W3 in both extratropical and tropical events. Although the differences were slightly higher for the tropical events than for the extratropical events, whether the tropical events are more predictable than the extratropical events should not be decided from this ROC curve analysis alone, because the number of samples was statistically constrained and the ROC curve analysis covers the whole domain rather than regional areas such as Jeddah and the central Red Sea.
In summary, the precipitation forecast skill of CP-WRF was relatively higher at W1, W2, and W3 for both extratropical and tropical events, whereas the driving raw ECMWF exhibited almost no skill at W2 and W3. At W1, the raw ECMWF showed some forecast skill for many events (i.e., AUC > 0.5), but was consistently outperformed by the CP-WRF (see Tables 4 and 5). Therefore, we concluded that the CP-WRF improves the predictability of extreme precipitation events (> 20 mm), up to three weeks ahead of the events on sub-seasonal forecast time scales. According to the categorical statistics (CSI, POD, and FAR) in Sect. 3.2, the precipitation forecast skill over Jeddah and the central Red Sea is higher in the extratropical events than in the tropical events. Therefore, the extratropical events provide the forecast opportunity of CP-WRF at these sub-seasonal time scales. Luong et al. (2020a) attempted to identify the dominant synoptic patterns underlying convective extremes over the AP as discussed in Sects. 2 and 3.3. They found that the interaction between the 500 and 850 hPa geopotential height fields closely influences the positioning of the RST and AA and convective organization over the AP. Therefore, we assessed the predictability of the precipitation forecasts by evaluating the synoptic-scale geopotential heights in the ERA-Interim and raw ECMWF at different lead times of the forecasting. Figure 18 displays the Pearson's correlation-coefficient contours (color-coded shading) between the ERA-Interim and the raw ECMWF 500 hPa geopotential heights, overlaid with the 500 hPa geopotential heights from the Luong-SOM analysis (solid lines) and the daily means of the 500 hPa geopotential heights extracted from the raw ECMWF (dashed lines). The correlation coefficients at W1, W2, and W3 were consistently higher for the extratropical events (Fig. 18a, c, e) than for the tropical events, as shown by both the color-coded shading and the average correlation coefficients (Rs) across the domain. The difference was notable at W3 (Fig. 18e and f) over Jeddah and along the west coast of the AP, where R > 0.7 for the extratropical events and R < 0.5 for the tropical events.

Geopotential heights
The 500 hPa geopotential height contours were also consistent with the correlation coefficients. At W1 (Fig. 18a), the raw ECMWF afforded extratropical 500 hPa geopotential contours that almost matched those of the Luong-SOM  Fig. 8 but the verification metrics were computed relative to the KAUST-RA precipitation and results are shown for the extratropical events. These results are consistent with those relative to the GPMF precipitation analysis, but its tropical 500 hPa geopotential contours (Fig. 18b) notably differed from those of the Luong-SOM analysis. At W2 and W3, the differences between the contours of the tropical events became more pronounced because the trough over the east coast of Africa was not developed (Fig. 18d, f). In contrast, the extratropical 500 hPa geopotential contours of the raw ECMWF almost matched those of the Luong-SOM analysis. The raw ECMWF did forecast the trough at W2 and W3, but at a weaker magnitude than that in the ERA-Interim.
The 850 hPa geopotential height contours presented higher correlations at W2 and W3 in the tropical events than in the extratropical events ( Fig. 19c-f), whereas at W1, the correlation coefficients of both types of events were almost identical. The correlation-coefficient difference was large at W3 (0.06 for the extratropical events versus 0.34 for the tropical events, also evidenced by the shading). In the tropical events, the raw ECMWF at W2 and W3 (Fig. 19d, f) apparently predicted the RST that closely matched both the ERA-Interim and the Luong-SOM analysis. The AA was also predicted (albeit weakly, with R < 0.5) at W2 and W3. However, in the extratropical events, the raw ECMWF poorly predicted the dominant extratropical trough over the Red Sea at W2 and W3 (Fig. 19c, e). Instead, it presented a weak signature of RST over the east coast of Sudan, probably because the SST is warmer in the raw ECMWF than in the ERA-Interim.
The correlation coefficients of the extratropical and tropical 500 hPa geopotential heights were compared by Student's t test. The differences at W1, W2, and W3 were statistically significant at the p < 0.05 significance level. However, when the correlation coefficients of the 850 hPa geopotential heights results were analyzed by the same test, the differences were statistically significant only at W2 and W3. The non-significant difference at W1 was consistent with the similar correlation coefficients at W1. Whether extratropical events over the AP are more predictable at sub-seasonal time scales than tropical events is difficult to conclude from the above results. We hypothesize that the 850 hPa geopotential height is more directly affected by the rapid changes of near-surface variables such as the SST, moisture fluxes, and near-surface temperature, than the 500 hPa geopotential heights. Therefore, at this height, the forecast contains more uncertainty at sub-seasonal time scales than at 500 hPa. Consequently, the predictability of the 850 hPa geopotential heights disagreed with those of the 500 hPa geopotential heights based on the event classification. An investigation with more extreme event samples is needed. However, previous studies (e.g., Hafez and Almazroui 2016;Christidis and Stott 2015) have heavily relied on the 500 hPa geopotential heights for diagnosing extreme events because 500 hPa is the level of non-divergence and is commonly assumed in extreme weather and climate diagnostics. Therefore, considering only the predictability of the 500 hPa geopotential heights, the categorical statistics (CSI, POD, FAR), and the ROC analysis, we conclude that extreme precipitation at sub-seasonal time scales (W1 to W3) is more predictable in the extratropical synoptic regime than in the tropical synoptic regime.

Discussion
Our results were constrained by the size of the ensemble members, the number of events, and the reduction of the ECMWF spatial resolution after day 15. With only 11 members in the reforecast dataset, the statistical analyses are less robust than analyses with at least 30 members. Thirty is commonly considered as the threshold sample number of meaningful statistics (Wilks 2011). Increasing the ensemble member size is quite feasible because the ECMWF generates 51-member ensembles through twice-weekly real-time forecasts . Additional uncertainties in two-and three-week lead times may be associated with the ECMWF spatial resolution shifts from 0.2° to 0.4° after day 15. Increasing the ensemble members, number of events, and spatial resolution would certainly increase the statistical Fig. 12 Frequency distributions of POD relative to the GPMF (left column) and the KAUST-RA (right column) as percentages of total grid points in the domain at W1, W2, and W3 for the extratropical events. The precipitation threshold is set to 20 mm. Bars marked with stars indicate that the frequency is statistically significant (> 90%) based on the Monte Carlo technique with 1000 random permutations. Note that the percentages of PODs > 0.8 are higher in the CP-WRF than in the raw ECMWF at all week lead times regardless of the precipitation products. They are statistically significant at W1 and W3. It is only PODs > 0.9 is statistically significant at W2 robustness of our findings, but risks exhausting the computing resources. In any event, our results demonstrate that the precipitation forecast skill at sub-seasonal time scales can be improved with CPM. Dynamic downscaling of the raw ECMWF clearly adds value to extreme precipitation forecasts by increasing the forecast skill and predictability to two-and three-week lead times, especially over the central Red Sea, Jeddah, and the west coast of AP. Increasing the ensemble member size is unlikely to change this essential conclusion.
As mentioned in Sect. 3.4, the rain gauge measurement dataset on which the satellite products are evaluated is limited both temporally (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) and spatially (43 sites sparsely distributed over the AP). This dataset includes only 11 events for our study, preventing a statistical analysis of the site-equivalent model precipitation relative to the rain gauge measurements. Moreover, the GPMF precipitation product has a large RMSE and bias relative to the rain gauge measurements, which poses an accuracy problem. Many studies (e.g., Kim et al. 2017;Tan and Duan 2017;Zhang et al. 2018) have demonstrated that the GPMF better obtains the daily precipitation and its spatial distribution than the TRMM and other satellite precipitation products, but only for the GPMF precipitation data collected after 2014. The dataset prior to 2014 is based on the TRMM Combined Ku Radar-Radiometer Algorithm (TRMM CORRA) product with fewer microwave channels and radar data than the GPM CORRA. The use of earlier GPMF data might lower the accuracy of the precipitation histogram and affect the mean rate (Huffman 2019). Therefore, our verification metrics and ROC curves might be slightly less accurate for the events prior to 2014 than for those after 2014. This problem reinforces the need for evaluating the model precipitation against the 5 km-resolution KAUST-RA product. As the results of the GPMF and KAUST-RA products were qualitatively the same, we are confident that the value added by our CPM is true and is unaffected by the vagaries of the observed precipitation products used in the analysis.
In this work, the 20 mm threshold was considered as the extreme precipitation threshold for the AP, especially as desert soil does not easily soak up water (Almazroui Fig. 13 Similar to Fig. 12, but for the tropical events. Note that the percentage of PODs > 0.7 is higher in the CP-WRF than in the raw ECMWF in one-week lead time. They are also statistically significant against both precipitation products. At two-and three-week lead times (W2 and W3 respectively), the slightly higher percentages of the raw ECMWF in PODs > 0.7 reflect its higher forecast skill over the central Red Sea as shown in Figs. 9 and 11 2011). We found that the forecast skill of CP-WRF over the central Red Sea and near Jeddah still outperformed that of raw ECMWF even when the precipitation thresholds are set to 30 and 35 mm (data not shown). Again, there is potential improvement for sub-seasonal precipitation forecasts by CPM over the AP with a three-week lead time. This improvement is possible for extratropical events.
SST plays an important modulating role on the weather in this region (Hoteit et al. 2021;Sun et al. 2019). The drier conditions in the central Red Sea at W2 and W3 during the tropical events were probably incurred by the specification of the modeled SST, which does not change significantly at sub-seasonal time scales. Low atmospheric instability is unfavorable for convective organization. The SST can influence extreme precipitation by up to 20% (Orth and Seneviratne 2017) and a cooler SST can suppress convective organization, especially in gulf-type water bodies (e.g., Mitchell et al. 2002). An SST bias correction might alleviate this problem, which affected the precipitation forecast over the Red Sea. The benefit of SST bias correction for convective organization has been demonstrated in a dynamically-downscaled climate forecast reanalysis (Carrillo et al. 2017). We note that the SST in our simulations was extracted from the ECMWF S2S reforecast dataset, and was not bias-corrected beyond the time scale of medium-range weather forecasts (15 days). Thus, during tropical events when the strong RST brings warm tropical air masses to the Red Sea region (Luong et al. 2020a), the CP-WRF probably adjusted the SST inadequately and suppressed the convective organization. A more dynamic treatment of the SST in the Red Sea and a coupled ocean model should be investigated in future CPM simulation studies.
Finally, given its potential to improve extreme precipitation forecasts at sub-seasonal time scales, our dynamicallydownscaled modeling approach can broaden the range of applications in the region. One potential application is MCS tracking based on the modeled cloud-top temperature and precipitation on sub-seasonal time scales. Feng et al. (2012) demonstrated MCS tracking with a precipitation-featurebased algorithm that inputs the satellite infrared-brightness temperature data and uses some pre-defined parameters. The same algorithm could be applied to MSC tracking over the AP using the CP-WRF model output at sub-seasonal time scales. The results would approximately identify the areas at highest risk of MCS-driven precipitation with one-to three-week lead times, providing early warnings of extreme weather events within an object-based context.

Summary and conclusions
Interest in sub-seasonal weather prediction has grown in recent years because it can potentially provide early extreme weather warnings beyond the weather forecast timescale. Ultimately, we hope to improve the preparedness capability of detecting extreme weather events over the AP, where occasional heavy precipitation in the cool season causes substantial loss of life and damage to infrastructure. Applying CPM, we here demonstrated that sub-seasonal forecasts of extreme precipitation over the AP can be potentially improved during the cool season (November-April). We identified the regions and periods of forecast opportunity and investigated the predictability of synoptic-scale forcing at sub-seasonal time scales.
Using the ECMWF S2S reforecast dataset as the lateral boundary condition of the CPM, we simulated 10 extreme precipitation events associated with an extratropical synoptic regime and eight extreme precipitation events associated with a tropical synoptic regime in Jeddah from 1999 to 2018. The results of the extratropical events were evaluated relative to the GPMF and KAUST-RA precipitation products based on three categorical statistics: CSI, POD, and FAR. The CP-WRF exhibited a high precipitation forecast skill at lead time W1 over the Red Sea, Jeddah, the central AP, and the Persian Gulf. At W2 and W3, the high forecast skill was  maintained mainly around Jeddah, the central Red Sea, and along the west coast of the AP. The CSI, POD, and some of the FAR results were field-significant. For the tropical events, the precipitation forecast skill of CP-WRF was low over the Jeddah area and the central Red Sea, but high over the central AP at W1. The precipitation forecast skill over this area remained low at W2, but was high in the central and northern AP. At W3, the forecast skill of CP-WRF was high near Jeddah and the central Red Sea. This high forecast skill of the CP-WRF was consistently reflected in the ROC curve analysis. At W1, the ROC curves of the extratropical and tropical events obtained by CP-WRF (unlike those of the raw ECMWF) were above the no-skill diagonal (AUC = 0.5) at W1, with AUC values greater than 0.59. At W2 and W3, the forecast improvement of the CP-WRF was also notable. The ROC curves of more than 50% of the events simulated in the CP-WRF were above the no-skill diagonal. The average differences, obtained by subtracting the AUC of the raw ECMWF from the AUC of the CP-WRF, were all positive. Clearly, the CP-WRF can potentially improve the forecasting of extreme precipitation events associated with both extratropical and tropical synoptic regimes over Jeddah and the central Red Sea during the cool season, with lead times up to three weeks.
The correlation coefficients between the raw ECMWF and ERA-Interim at the 500 hPa geopotential heights were higher for the extratropical events than for the tropical events at W1, W2, and W3. Over Jeddah and the west coast of AP, the correlation coefficients were also higher for the extratropical than for the tropical events at W3. Moreover, the Fig. 18 Pearson's correlation coefficient (shading) between the raw ECMWF and the ERA-Interim 500 hPa geopotential heights. The dashed and solid contours are the average daily means of the raw ECMWF 500 hPa geopotential heights and the 500 hPa geopotential heights of the Luong-SOM analysis, respectively. Both are resolved to one-tenth of a kilometer. The R values are the average correlations across the domain. The star in each panel indicates the location of Jeddah 500 hPa geopotential contours of the raw ECMWF more closely matched those of the ERA-Interim for the extratropical events than the tropical events. In contrast, the same analysis on the 850 hPa geopotential heights showed higher correlation coefficients at W2 and W3 for the tropical events than for the extratropical events. At W3, the correlations were much lower (< 0.1) in the extratropical events than in the tropical events. Considering the high uncertainty in the 850 hPa geopotential height forecasts at sub-seasonal time scales and the analysis results of the categorical statistics and ROC, we concluded that the extreme precipitation events associated with the extratropical synoptic regime are more predictable than those associated with the tropical synoptic regime at sub-seasonal time scales.
Although this study was constrained by the low number of events and ensemble size, it demonstrates the potential improvement in forecasting extreme cool-season precipitation over the AP at sub-seasonal time scales. If this potential is realized, we could establish a forecast system for extreme weather warnings beyond the weather forecast timescales over the AP. Moreover, this S2S downscaling technique is expected to be applicable in other parts of the world, especially regions with similar climatic characteristics to the AP such as northern Chile, the southwest United States, and northwestern Mexico. It should also be useful in regions subjected to increasing frequency of extreme precipitation events driven by climate change. Issuing extreme weather warning with three-to four-week lead times could prevent casualties and socioeconomic loss.  Fig. 18 but for the 850 hPa geopotential heights A long-term dynamically-downscaled CPM reforecast is imminent. We will generate a 20 year database of subseasonal reforecasts from 1999 to 2020 at the same convective-permitting resolution. This database will include more cool-season precipitation events than the present dataset, and an additional four-week lead time for event forecasting. Increasing the number of precipitation events will improve the sample size in our statistical analyses. We expect to more robustly demonstrate the potential improvement of sub-seasonal forecasts, extending the lead time to four weeks, in a future analysis.
Funding This work is supported by a competitive research grant from the King Abdullah University of Science and Technology with sub-award agreement OSR-2018-CRG7-3706.2. This work is based on S2S data. S2S is a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Programme (WCRP). The original S2S database is hosted at ECMWF as an extension of the TIGGE database. We thank the editor and the two anonymous reviewers for their constructive feedback.
Data availability All the data used in this study are stored on the Shaheen supercomputer at the King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. The data are available upon request.

Conflict of interest
The authors declare that they have no conflict of interest.