WRF gray-zone dynamical downscaling over the Tibetan Plateau during 1999–2019: model performance and added value

The Tibetan Plateau (TP) is an important component of the global climate system, while the characteristics of its climate are poorly represented in most regional climate models at coarse resolutions. In this study, a 20-year (2000–2019) dynamical downscaling simulation at the gray-zone resolution (9 km) using the WRF model driven by the ERA5 reanalysis is conducted over the TP. Based on comparisons against in-situ observations and the Integrated Multi-satellite Retrievals for GPM (IMERG) version 6 satellite precipitation product, the assessment of basic climate variables, such as near-surface air temperature (T2m) and precipitation, is performed to evaluate the model’s performance and understand its added value better. Results show that both WRF and ERA5 can successfully reproduce the spatial patterns of annual mean and seasonal mean surface air temperature. However, significant cold and wet biases are found especially over the southeastern TP in ERA5, which are greatly improved in WRF with reduced RMSEs. Not only the climatological characteristics, but also the inter-annual variability and seasonal variation of T2m and precipitation are well captured by WRF which reduces the cold and wet biases especially in winter and summer compared to ERA5, respectively. Besides, at daily scale, the overestimation of precipitation in WRF and ERA5 is mainly caused by the overestimated precipitation frequency when precipitation intensity changed slightly. Furthermore, WRF outperforms ERA5 in capturing the diurnal variation of precipitation with more realistic peak time in all sub-regions over the TP. Further investigation into the mechanism of model bias reveals that less simulated snow cover fraction plays a crucial role in increasing the surface net energy by affecting surface albedo over the southeastern TP in WRF, leading to higher T2m. In addition, less water vapor transport from the southern boundary of TP leads to reduced wet bias in WRF, indicating that the added value in dynamical downscaling at gray-zone resolution is obtained by representing water vapor transport more realistically.


Introduction
The Tibetan Plateau (TP), also called the "Third Pole", is the highest and most extensive highland in the world, and exerts a great influence on regional and global climate through its thermal and dynamical forcing mechanisms Wang et al. 2018a;Nan et al. 2021;Wen et al. 2022). TP has a tremendous impact on Asian Summer Monsoon (ASM) through elevated heating and forced topographic Rossby waves (Ge et al. 2017;He et al. 2019;Son et al. 2019;Seok and Seo 2021;Li et al. 2022a). It is also called the "Asian water tower", which contains the largest number of glaciers outside the polar regions (Yao et al. 2012). The extremely high and varied topography and complex land surface conditions over the TP resulted in the varied regional and local climates. And the TP has warmed up in a higher trend than the rest of the world in the past decades Yao et al. 2019Yao et al. , 2022. Recently, climate change over the TP has gained growing attention due to its significant impacts on climate system and regional hydrological cycle. Accurate regional climate characteristics are urgently needed to 1 3 better understand climate change over TP. However, due to the steep terrain and harsh environment, the distribution of observation stations is very uneven and sparse over the TP. Most of the meteorological stations are located in the eastern TP, and few can be found over the north-western TP. Thus, climate models, including global climate models (GCMs) and regional climate models (RCMs), are essential tools to study climate change over the TP (Gao et al. 2015;Fu et al. 2021;Zhang and Gao 2021).
GCMs are widely used in climate simulation and projection over the TP. Lun et al. (2021) evaluated the simulation performance of precipitation and temperature with GCMs from Coupled Model Inter-comparison Project Phase 5 and 6 (CMIP5 and CMIP6) over the TP. The results showed that GCMs tend to overestimate precipitation and underestimate surface air temperature, and GCMs of CMIP6 perform better than those of CMIP5. Although GCMs can resolve large-scale climate systems, they still have limitations in reproducing local and regional climate, especially over regions with complex terrain such as the TP, due to their coarse resolution (Yue et al. 2016;Jiang et al. 2016;Chen and Frauenfeld 2014;Duan et al. 2013;Salunke et al. 2019;Xu et al. 2017a). With finer horizontal resolution and better-resolved regional characters, such as local topography and land surface conditions, RCMs can well reproduce the regional to local climate (Giorgi 2019;Xu et al. 2019;Gutowski et al. 2020). RCMs have been widely used in simulating and projecting regional climate change over the TP (Gao et al. 2015;Wang et al. 2018a;Niu et al. 2021). Gao et al. (2015) conducted a 30-year continuous simulation over the TP using the Weather Research and Forecasting Model (WRF, https:// www. mmm. ucar. edu/ weath er-resea rch-and-forec asting-model) with a 30-km spatial resolution and showed smaller biases against observations compared with global reanalysis forcing. By using multiple RCMs from the Coordinated Regional Climate Downscaling Experiment (CORDEX) at the resolution of 50 km, Guo et al. (2018) evaluated the RCMs' performance in reproducing surface air temperature and precipitation changes over TP, and found that RCMs have cold and wet biases respectively. Driven by the Community Climate System Model (CCSM), Zhang and Gao (2021) used the WRF model to study the projected changes in precipitation recycling over the TP, and showed increasing trends of precipitation recycling ratio changes with elevation under the RCP4.5 and RCP8.5 scenarios.
Over the TP with steep terrain, the choice of horizontal resolution is very important for regional climate simulation, especially for precipitation and water vapor transport simulations (Xu et al. 2018;Lin et al. 2018). Karki et al. (2017) quantified the added value of convection-permitting (CP) (≦4 km) regional climate simulations over the Himalayas, and suggested added value of the convective-scale resolutions in realistically resolving the topo-climates over the central Himalayas. Recently, studies have shown that regional climate modeling at the convection-permitting scale can well simulate the precipitation distribution and diurnal cycle over the TP (Gao et al. 2020;Li et al. 2021;Zhou et al. 2021;Ma et al. 2022). However, the CP simulations over the TP require huge computing resources, and currently limit the long-term climate change simulations. The resolution of approximately between 15 and 4 km (such as 9 km) is the so-called gray-zone scale, at which scale clouds and convective transport can be partially and explicitly represented (Shin and Hong 2013). Many studies have done simulations at 9 km (Maussion et al. 2014;Ou et al. 2020;Wang et al. 2020;Lin et al. 2018;Huang et al. 2021). Lin et al. (2018) investigated the effect of model resolution on regional climate simulation over the TP with three spatial resolutions (30, 10, and, 2 km, respectively), and suggested that a resolution of approximately 10 km can strike a balance between a more spatially detailed simulation and expensive computational cost. Compared to gray-zone resolution simulation, the CP simulation usually needs several tens of computing cost and storage. When the spatial resolution is approximately 9 km, which is within the gray-zone grid spacing, the cumulus parameterization scheme (CPS) can be used or ignored. Ou et al. (2020) conducted four WRF simulations at 9 km resolution to evaluate the simulation of the precipitation diurnal cycle and found that all the experiments with CPS overestimate the amount of precipitation, and the experiment without CPS outperforms the others. Although several regional climate simulations at gray-zone or convectionpermitting scale have been performed over the TP, most of the experiments were conducted over a relatively short period (months) or parts of TP Zhao et al. 2021;Zhou et al. 2021). Due to the significant role of TP in global and regional atmospheric circulation, and the non-negligible effects of global warming, it is essential to generate an accurate long-term dataset with fine resolution (at gray-zone or CP scale) covering the whole TP and its surrounding areas.
In this paper, driven by the fifth generation of atmospheric reanalysis from Europe Centre for Medium-Range Weather Forecasts (ECMWF) (ERA5) (Hersbach et al. 2020(Hersbach et al. ), 20-year (1999(Hersbach et al. -2019 long-term dynamical downscaling at gray-zone (9 km) scale is conducted using the WRF model with the year 1999 as the spin-up time. The aim of this study is to evaluate the performance of long-term regional climate simulation at the gray-zone scale over the TP, and to assess the possible reasons for model biases. This study is organized as follows, Sect. 2 describes the model, data, and experimental design. Section 3 presents the evaluation results compared 1 3 with observations. The possible reasons for model biases are discussed in Sect. 4, and this study is summarized in Sect. 5.

Model and experimental design
The state-of-the-art non-hydrostatic WRF model (version 4.1.1) was used in this study (Skamarock et al. 2019). The model domain using Lambert projection is centered at 33° N, 88.5° E, with 531 × 361 grid points in the east-west and south-north direction at 9 km horizontal resolution (WRF9), respectively ( Fig. 1), covering TP and its surroundings areas. Forty vertical unevenly distributed levels were defined from the surface to the model top at 50 hPa. The initial and boundary conditions are from the ERA5 global reanalysis, which has a horizontal resolution of 0.25° × 0.25° and every 3 h were downloaded. The model was integrated continuously for 21 years, starting from 1 Jan 1999 to 31 Dec 2019, and the first year (1999) was used as spin-up time.
Based on the previous study (Ou et al. 2020) over the TP, the following physical parameterization schemes were used in this study: RRTMG long-wave and short-wave radiative transfer scheme (Iacono et al. 2008), the Unified Noah land surface model (Tewari et al. 2004) (NOAH), Yonsei University (YSU) planetary boundary layer (PBL) parameterization (Hong et al. 2006) and Thompson microphysics parameterization scheme (Thompson et al. 2008). The cumulus parametrization scheme was switched off in the WRF simulation at gray-zone resolution (9 km), which is similar to that of Ou et al. (2020) and Sun et al. (2021). To prevent the WRF simulation from drifting away from the global reanalysis forcing, the spectral nudging (SN) method, which is considered an indirect assimilation method (von Storch et al. 2000), was applied to wind fields above the PBL. SN is an efficient method to improve the performance of dynamical downscaling in WRF (Tang et al. 2017;Mai et al. 2020). SN can force the model close to the long waves of the driving data, whereas adding values in the smaller scales (Miguez-Macho et al. 2005). The wavenumber of 4 is employed in zonal and meridional directions, which represents that the large-scale circulation with a wave-length larger than about 1000 km will be nudged, and the nudging coefficient of 3.0 × 10 -4 was used.

Validation dataset
To evaluate the accuracy of the regional climate products produced by WRF, several observation datasets including insitu observations, satellite precipitation and snow products, and the ERA5 reanalysis dataset were used.
The daily in-situ observations used in this study, which include daily near-surface air temperature (at 2 m above ground level, T2m), maximum/minimum surface air temperature (Tmax/Tmin), and daily precipitation, are provided by the data service center at China Meteorology Administration (CMA, http:// data. cma. cn/ en). There are 143 observation stations available over the TP, and most of them are located in eastern TP (Fig. 1). The station observations have gone through a quality control procedure. The standard normal homogeneity test (SNHT, Alexandersson and Moberg (1997)) was applied to T2m, Tmax, and Tmin, and 6 stations with perceptible shifts were corrected.
To better evaluate the simulated precipitation characteristics especially the diurnal cycle of precipitation, the Integrated Multi-satellite Retrievals for GPM (IMERG) version 6 satellite precipitation product, which has high spatial (0.1° × 0.1°) and temporal (30 min) resolutions, is also used (Ma et al. 2022;Li et al. 2022a;Zhang et al. 2018). The IMERG Final Run data uses monthly precipitation gauge analysis from Global Precipitation Climatology Center (GPCC) and ancillary data from the ECMWF for calibration. Therefore, IMERG is considered one of the most reliable products that are suited for research studies (Huffman 2015). In previous studies, IMERG has been extensively studied and evaluated against ground-based, satellite, and reanalysis products around the world, and results all indicate that IMERG can provide reliable precipitation characteristics for research in the TP region (Pradhan et al. 2022;Tang et al. 2020;Li et al. 2022b). The snow cover fraction (SCF) dataset from the Interactive Multi-sensor Snow and Ice Mapping System (IMS), which is provided by the U.S. National Ice Center (USNIC) for the Northern Hemisphere (Ramsay 2000), is also used to compare with the WRF9 simulated snow cover over the TP. The IMS data has a temporal resolution of 24 h and a spatial resolution of 24 km.
For the comparison of surface air temperature between WRF9, ERA5 reanalysis and, in-situ observations, the WRF9 results and ERA5 reanalysis data are interpolated to observation stations using nearest interpolation. Due to the difference in topography height between WRF9 grids and station locations, the annual mean lapse rate (LR) of a specific region (− 4.15, − 4.02, and − 5.06 K/km for stations located in western TP, northeastern TP, and southeastern TP, respectively) is used to correct biases of WRF9 surface air temperature after interpolation (Wang et al. 2018b). Three sub-regions were selected for detailed analysis, which are northeastern TP (TP-NE, 37 stations), central TP (TP-C, 23 stations), and southeastern TP (TP-SE, 71 stations) (Fig. 1). TP-NE covers the Qaidam Basin, TP-SE represents areas with sharp changes in terrain altitude, and TP-C contains most of the remaining in-situ stations.

Evaluation of surface temperature and precipitation over the TP
The general ability of the WRF model to reproduce the mean surface climatology is firstly assessed by comparing the simulation results with the in-situ observations during the period of 2000-2019. Figure 2 shows the 20-year (2000-2019) averaged biases of annual, summer (JJA) and winter (DJF) mean T2m, Tmax, and Tmin. For annual mean T2m (Fig. 2a, d), both WRF9 and ERA5 have underestimated T2m by 2-4 ℃ over regions south of 35° N, but slightly overestimated T2m over Qaidam Basin. The WRF9 clearly underestimates the annual mean Tmax over the TP, while larger cold biases can be found in ERA5 with the largest bias reaching -8 ℃ over the TP-SE region. The spatial distribution of summer mean temperature biases is quite similar to that of the annual mean  with lower biases. In winter, ERA5 clearly shows larger cold biases over all sub-regions as shown in Fig. 1, especially for Tmax with the largest cold bias reaching -9 ℃. Table 1 lists the statistical skill scores, including the root mean square error (RMSE), the spatial correlation coefficient (SCOR), and the absolute mean bias of 20-year averaged annual and seasonal mean T2m, Tmax, Tmin, and, precipitation of the WRF9 and ERA5 against in-situ observations over the TP. It can be found that both WRF9 and ERA5 can reproduce the spatial patterns of surface air temperature with the SCORs all above 0.87, and WRF9 outperforms ERA5 in simulating temperature with higher SCORs and lower RMSEs. Figure 3 shows the spatial distribution of annual and summer mean precipitation biases from WRF9 and ERA5 over the TP compared to in-situ observations and IMERG, respectively. Compared to the in-situ observations, WRF9 can well simulate the distribution of annual mean precipitation with the SCOR at 0.84 and RMSE at 0.43 mm/day, but it slightly underestimates precipitation over south TP-C. On the contrary, obvious wet biases can be found over the whole TP in ERA5 and the RMSE reaches 1.41 mm/day which is about triple that of WRF9. The spatial patterns of summer mean precipitation biases are quite similar to that of the annual mean, and WRF9 can significantly reduce the wet biases which exist in ERA5, especially at the south edge of TP. Lin et al. (2018) showed the horizontal resolution of 10 km can represent the barrier effect of high mountains and the channeling effect of valleys compared to 30 km, and 2 km can reproduce more details but needs expensive computing costs. In our study, it is also found that most wet or dry bias of precipitation and water vapor from WRF9 compared to ERA5 are consistent with the higher or lower altitude at the south of TP. Therefore, gray-zone scale is the optimum resolution that simulates the entire TP, striking the balance between simulation performance and computing cost. Our gray-zone simulation also can support guidance for convection-permitting model (CPM) long-term simulation in the future. As shown in Table 1, it is clear that WRF9 outperforms ERA5 in reproducing seasonal precipitation with higher SCOR and lower RMSE and bias, indicating the added value in dynamical downscaling at the gray-zone resolution over the TP.

Inter-annual and seasonal variation of surface temperature and precipitation
Figures 4 and 5 depict RMSE and temporal correlation (TCOR) of annual mean T2m and precipitation at each station between WRF9 simulation, ERA5 reanalysis, and in-situ observations for the period 2000-2019. WRF9 can well simulate the inter-annual variation of T2m with most of TCORs/RMSE higher/lower than 0.6/1.0 ℃, respectively, while ERA5 produces larger RMSEs especially over TP-C where the RMSEs of the inter-annual variation of T2m is larger than 2.6 ℃, and lower TCORs also exist over there. For precipitation, although WRF9 and ERA5 show similar performance in simulating the inter-annual variation with most TCORs at about 0.5, WRF9 reduces the RMSEs especially over southern TP. The inter-annual variations of SCORs and RMSEs of seasonal mean T2m and precipitation between WRF9, ERA5 reanalysis, and in-situ observations are shown in Figures s1 and s2. For T2m, the performance varies greatly in different seasons. The SCORs between WRF9, ERA5 and in-situ observations are higher than 0.9 in summer and autumn, and the RMSEs are all below 3 ℃ with the minimum RMSEs in summer. In spring and winter, both WRF9 and ERA5 show larger RMSEs ranging from 2.5 to 4.5 ℃, and WRF9 outperforms ERA5 with higher SCORs and lower RMSEs, especially in winter. For seasonal precipitation, WRF9 generally outperforms ERA5 with comparable SCORs and significantly lower RMSEs. WRF9 and ERA5 show comparable SCORs ranging from 0.7 to 0.8  (2000-2019) annual (a-f), summer (g-l) and winter (m-r) averaged daily mean (T 2m ), maximum (T max ), and minimum (T min ) 2 m above ground level (AGL) temperature between WRF9, ERA5 and in-situ observation, unit: ℃ in spring, and WRF9 reduces the RMSEs by about 1 mm/ day. The SCORs for summer precipitation are relatively lower at about 0.6, and RMSEs are the highest among the four seasons. WRF9 improves the summer precipitation simulation by a significant reduction of RMSE each year. It also gives higher SCORs and lower RMSEs than ERA5 in autumn and winter.
Taylor diagrams (Taylor 2001) are used to further assess the model's ability to simulate the inter-annual variability of annual mean precipitation and surface air temperature in three sub-regions (Fig. S3). For T2m, the WRF9 simulations are closer to the in-situ observation reference especially for TP-NE and TP-SE, indicating better performance in reproducing inter-annual variation of T2m over these two subregions. However, ERA5 shows larger normalized standard deviations over TP-C and TP-SE. WRF9 outperforms ERA5 in simulating the inter-annual variability of Tmax with higher TCORs, and the normalized standard deviations are close to 1. Compared to T2m and Tmax, both WRF9 and ERA5 show relatively lower performance in simulating the inter-annual variation of Tmin. WRF9 shows better skill over TP-SE and TP-NE. For precipitation, WRF9 can largely improve the simulation of inter-annual variability over TP-C and TP-SE. Over TP-NE, both WRF9 and ERA5 show lower normalized standard deviations compared to insitu observation, and ERA5 slightly outperforms WRF9 with higher TCOR.
The monthly variations of regional averaged T2m and precipitation over three sub-regions are shown in Fig. 6. Over all three sub-regions, ERA5 shows significantly overestimated monthly precipitation, while WRF9 slightly underestimates it. Large wet biases up to 3 mm/day can be found in ERA5 simulated precipitation in warm seasons, especially over TP-C. T2m is well reproduced by both ERA5 and WRF9 over TP-NE region with slightly overestimated summer T2m in WRF9 simulation. Over TP-C and TP-SE regions, cold biases exist in WRF9 simulation and ERA5 reanalysis, and ERA5 tends to produce larger cold biases in cold seasons. Overall, WRF9 can reduce the wet biases existing in ERA5 in summer and cold biases in winter.
In general, WRF9 can improve the simulation of interannual variability and seasonal variation of surface temperature and precipitation, and shows added value compared to the driving data-ERA5 reanalysis.

Daily precipitation frequency and intensity
In terms of daily precipitation, the 20-year averaged daily precipitation frequency and intensity are calculated and shown in Fig. 7. The daily precipitation event is defined as the day which has an accumulated precipitation larger than 0.1 mm. For the observed distribution of precipitation frequency fraction, the high frequency is located over the southern and southeastern TP with a frequency fraction above 40%, and the frequency gradually decreases from southeast to northwest. WRF9 can reproduce the spatial pattern of precipitation frequency fraction with the SCOR at 0.88 and RMSE at 0.08, while it slightly overestimates the number of daily precipitation event over the TP especially over TP-SE. Although ERA5 can reproduce the distribution of precipitation frequency with SCOR at 0.78, it simulates much more daily precipitation events over the TP with a frequency fraction above 80%, which is twice summer (e-f) averaged daily precipitation (interpolated to observations) between WRF9, ERA5 and gauged observation, and the same for annual (c, d) and summer (g, h) averaged daily precipitation (interpolated to IMERG) between WRF9, ERA5 and IMERG, unit: mm/day Fig. 4 The spatial distribution of temporal RMSEs (row 1) and TCORs (row 2) of annual T2m temperature from WRF9 and ERA5 compared with in-situ stations, unit of (a, b): ℃ Fig. 4. but for precipitation, unit of (a, b): mm/day that of in-situ observation. The mean daily precipitation intensity is represented by using the total precipitation divided by precipitation days. The observed strong precipitation intensities (above 5.0 mm/day) are located over TP-SE and TP-C, and the intensity decreases from south and southeast to northwest. Both WRF9 and ERA5 can simulate the spatial distribution of precipitation intensity with the SCOR at about 0.73, but they all underestimate the intensity over the TP, especially over TP-SE and TP-C where high intensity exists in observation. Moreover, ERA5 tends to produce weaker precipitation intensity over TP-NE compared to WRF9 and in-situ observations. It can be concluded that the overestimation of precipitation  Figure 8 shows the regional averaged annual precipitation days at different precipitation intensities over TP-NE, TP-C and TP-SE. In general, WRF9 can well simulate the precipitation days at different precipitation intensities over all three sub-regions with slight underestimations over TP-NE and slight overestimations from 0.1 to 4.0 mm/day over the other two sub-regions. However, ERA5 significantly overestimates the precipitation days at all precipitation intensities over all sub-regions, leading to an overall overestimation of precipitation amount. In general, the WRF9 experiment at the gray-zone resolution can improve the simulation of daily precipitation frequency and intensity, and shows added value in reproducing daily precipitation events.

Diurnal cycle of precipitation
The IMERG satellite precipitation product is used as observation when evaluating the diurnal variation of precipitation from different sources. Figure 9 shows the peak time (LST, Local Standard Time) of precipitation amount, frequency, and intensity during 2001-2019 from IMERG, WRF9 simulation, and ERA5 reanalysis, ignoring the year 2000 because the IMERG observation begins from June 2000. As Singh and Nakamura (2009) shows, precipitation activity peak time occurs during late afternoon, late evening, and secondary morning over hilly regions, valleys and large lakes, respectively. In Fig. 9, compared to ERA5, WRF9 exhibits more details due to the improvement of terrain representation. The occurrence time of maximum precipitation amount and frequency in mountains and valleys is consistent with Singh and Nakamura (2009). As is shown in Fig. 9a, based on satellite observation, the occurrence time of maximum precipitation amount appears from late afternoon to evening (1600-2000LST) along the southern edge of TP, TP-C, part of TP-SE and TP-NE regions except for Qaidam Basin, and from nighttime to early morning (2000-0400LST) over other regions. WRF9 can generally reproduce the peak time of the maximum precipitation mainly occurring during nighttime over TP-SE and southern TP, while it simulates the peak time about 4 h later than IMERG over north TP-C. Significant advanced precipitation amount peak time can be found in ERA5 over southern TP and TP-SE, where the maximum precipitation amount occurs from later afternoon to evening, which is about 4 h earlier than IMERG. Both WRF9 and ERA5 delay the peak time by about 4 h over Qaidam Basin. For the diurnal cycle of precipitation frequency, the spatial distribution of peak time for IMERG is similar to that of precipitation amount, which is the same for WRF9. But for ERA5, there are great differences between the diurnal cycle of precipitation frequency and amount. As Fig. 7 shows, the frequency of daily precipitation events in ERA5 is above 80% over the TP, which is twice that of insitu observation. Figure 9 shows that the precipitation events that ERA5 simulates occur mainly in early afternoon, which may result from the excessive convective precipitation in ERA5. For the diurnal cycle of precipitation intensity, the peak time for IMERG appears from nighttime to early morning (2000-0400LST) over most regions of the TP, except for north of TP-C where the maximum precipitation intensity occurs from afternoon to evening (1400-1800LST). Similar to the simulation of precipitation amount peak time, WRF9 well simulates the intensity peak time over southern TP with the maximum precipitation intensity occurring during nighttime, but delays the peak time by about 6 h over north of TP-C. ERA5 can reproduce peak time over western and north of TP-C, while it reaches the precipitation peak about 6 h in advance over TP-SE.
The regional averaged diurnal cycle of precipitation amount, frequency, and intensity over three sub-regions during 2001-2019 are shown in Fig. 10. Over all sub-regions, ERA5 overestimates precipitation amount, frequency, and intensity, especially around 1800LST. Over TP-NE, both WRF9 and ERA5 can reproduce the peak of precipitation amount and frequency in the afternoon (1800LST). While ERA5 cannot reproduce the diurnal variation of precipitation intensity with the peak time 6 h in advance, while WRF9 can fit the diurnal variation better. Over TP-C, the satellite observation has peak precipitation amount and frequency at about 2000 LST, while the precipitation peaks in WRF9 and ERA5 are about 4 and 2 h backward and ahead, respectively. Over TP-SE, the peaks of precipitation amount, frequency, and intensity from ERA5 are all 3-4 h ahead of IMERG. WRF9 outperforms ERA5 in reproducing the diurnal variations of precipitation with closer peak time compared with IMERG.

Main mechanism for the added value of WRF9 simulation
As the spatial resolution increases, WRF9 outperforms ERA5 in many aspects not limited to T2m and precipitation. The improvement of T2m may result from the difference in surface energy balance between WRF9 and ERA5. The improvement of precipitation may result from an overall improvement of the precipitation formation related environment, in which water vapor content and transport are discussed in this study. We will also discuss the main mechanism for the diurnal cycle of precipitation in the following.

Surface energy balance
The surface energy budgets including radiation factors and surface heat fluxes have a strong connection with surface skin temperature. The surface energy balance and its connection with surface skin temperature over the Tibetan Plateau are estimated following the methodology in Li et al. (2016a). The surface energy balance equation is given by: where is the bulk emissivity, σ is the Stefan-Boltzmann constant, T s is surface skin temperature, ALBDEO is the land surface albedo, RSDS and RLDS are the downward shortwave and longwave radiation at surface, respectively, SHFX and LHFX are the sensible and latent heat flux at surface, respectively, and GHFX is the ground heat flux, which is generally very small in the long-term average. To analyze the added value of WRF9 in simulating the distribution of surface air temperature, the annual and winter (when the maximum temperature bias occurs) mean surface net energy (the sum of right-hand terms in the above equation) from WRF9 and ERA5, and their differences are calculated and shown in Fig. 11. Generally, the spatial patterns of the surface net energy obtained from WRF9 simulation are close to that from ERA5, but with more details of local-scale information. From the differences in the surface energy between WRF9 and ERA5 (Fig. 11e, f), it is clear that WRF9 exhibits more surface net energy than ERA5 over TP-SE, especially in winter which shows positive deviation up to 35 W/m 2 .
While negative values can be found over most regions in western TP and Qaidam Basin, where WRF9 simulates less net energy. ERA5 has less downward shortwave radiation as a result of cloud reflection and scattering caused by overestimated precipitation over almost the whole TP. Nevertheless, ERA5 has more surface net energy in western TP, surface albedo may play a primary role in surface net absorbed shortwave radiation, which leads to more surface net energy in ERA5 in western TP. Through considerable influence on the surface albedo, snow cover plays an important role in the surface energy balance (Xu et al. 2017b;Xie et al. 2019), which has direct effects on surface energy and temperature. Figure 12 shows the annual and winter mean snow cover over the TP from IMS satellite product, WRF9 experiment, and ERA5 reanalysis and the differences in snow cover and surface albedo between WRF9 and ERA5. The snow cover fraction (SCF) Fig. 10 Diurnal cycle of precipitation amount (a, d, g, unit: mm/h), frequency (b, e, h, unit: 100%), and intensity (c, f, i, unit: mm/h) averaged over three subregions for IMERG, WRF9 and ERA5 (averaged over the grids in each subregion) is calculated using the snow depth data from the relation SCF = min (1, snow depth/10) in WRF9 and ERA5, where snow depth is in centimeters. WRF9 can well simulate the spatial pattern of SCF with the maximum SCF over western TP and TP-SE, but it overestimates the SCF over northern TP. ERA5 also can reproduce the maximum SCF over western TP and south-central TP, but it underestimates the SCF over northern TP. The difference in SCF between WRF9 and ERA5 clearly shows the positive SCF deviation over northwestern TP, and the negative deviation over TP-SE, which is quite similar to the difference of surface albedo and surface net energy. There is little difference between the pattern of SCF and surface albedo, which may be due to the calculation method of SCF. Thus, it can be inferred that the difference in SCF between WRF9 and ERA5 may affect the surface albedo and surface net energy, which leads to the difference in surface temperature. WRF9 tends to simulate less SCF over TP-SE, resulting in the improvement of surface air temperature simulation.
It is found that surface albedo plays a major part in influencing the simulated surface temperature, particularly in winter Pang et al. 2022;Lin et al. 2020). GHFX plays a vital part in the improvement of Tmin. Although GHFX is very small in the long-term average, it is crucial in the adjustment of the energy diurnal cycle. GHFX from WRF9 is larger than that from ERA5 in the eastern and central TP, which leads to the rise of Tmin in those regions. Overall, more surface net energy leads to higher surface temperature, which resulted in the improvement of the cold bias of T2m, Tmax and Tmin in WRF9 simulation over TP-SE.

Water vapor content and transport
The water vapor content and its transport are very crucial to the distribution of precipitation Li et al. 2016a;Yan et al. 2020). Figure 13 depicts the 20-year mean (2000-2019) distribution of annual and summer mean specific humidity at 450 hPa and vertically integrated water vapor transport from WRF9 and ERA5, and the differences  , b), ERA5 (c, d) and the differences (e, f) between them, unit: W/m 2 between them. The WRF9 simulated spatial patterns of specific humidity at 450 hPa are in good agreement with that of ERA5, with high water vapor content over southern TP and decreasing from south to north. Both WRF9 and ERA5 show the main water vapor transport from the southern boundary and western boundary over the TP, especially in summer. The difference in specific humidity at 450 hPa between WRF9 and ERA5 shows that WRF9 simulates less water vapor content over most regions in the TP compared to ERA5, which is consistent with the difference Fig. 12 20-year mean distribution of annual and winter mean snow cover over TP from IMS (a, b), WRF (c, d) and ERA5 (e, f) and the differences of snow cover (g, h) and surface albedo (i, j) between WRF and ERA5, unit: 100% in precipitation distribution. And the negative deviation of specific humidity at 450 hPa in WRF9 simulation is mainly caused by less water vapor transport from the southern TP boundary. Previous studies (Lin et al. 2018;Wang et al. 2020;Zhou et al. 2021) have shown that high horizontal resolution can resolve more orographic drag and other processes associated with heterogeneous surface forcing, leading to weakened northward flow which also decreases the northward water vapor transport.

Main mechanism for the diurnal cycle of precipitation
The diurnal cycle of precipitation is an important aspect of regional climate, and the regular occurrence of precipitation at a specific time of day is related to physical processes that indicate strong convection during the specific time (Yang and Slingo 2001;Sorooshian et al. 2002). There are very few studies that explains the mechanism of the diurnal cycle of precipitation in TP. We will make effort to explore the possible mechanisms behind the diurnal cycle of precipitation. Generally, the occurrence time of maximum precipitation amount, frequency, and intensity in Fig. 10 is similar to that of Ou et al. (2020). We have additionally divided TP into three sub-regions to investigate the regional differences of precipitation diurnal cycle, which has not been done previously. There are several main factors that influence precipitation: sufficient water vapor supply, fierce ascending motion to the supersaturated state, enough cloud condensation nuclei (CCD), and other factors. Figure 14 shows the 20-year mean distribution of water vapor mixing ratio at 2 m (Q2) and wind speed at 10 m (UV10) for WRF and ERA5 from 1600 to 0200 LST with two-hour intervals, respectively. Over TP-SE, Q2 reaches its peak at 1600 LST, at which time Fig. 13 20-year mean distribution of annual and summer mean specific humidity at 450 hPa (shaded, unit: g/kg) and vertical integrated water vapor transport (vectors, unit: kg/(m*s)) from WRF (a, b), ERA5 (c, d) and the differences between them (e, f)

3
Fig. 14 20-year mean distribution of anomaly of water vapor mixing ratio at 2 m (shaded, unit: g/kg) and wind speed at 10 m (vectors, unit: m/s) for WRF (a-f) and ERA5 (g-l) from 1600 to 0200LST with two-hour intervals the formation of convergence occurs, thus leading to ascending motion for ERA5. Thus, the peak time of precipitation amount over TP-SE occurs in late afternoon (1600-1800 LST). In WRF9, TP-SE is controlled by fierce westerly and Q2 does not reach its peak at 1600 LST. Until 2000 LST, conditions favoring the formation of precipitation reflected with Q2 peak and water vapor convergence are satisfied. Therefore, the peak time of precipitation amount over TP-SE for WRF9 occurs at around 2000 LST. Over TP-C, where there is less Q2 compared to TP-SE, Q2 reaches its peak between 2200 and 0000 LST for WRF9, when precipitation is most likely to occur. But for ERA5, there are enough Q2 at around 1800 LST over the eastern part of TP-C, and without the strong controlled of westerly, precipitation occurs at around 1800 LST. As time goes on, the rain belt gradually moves westward, with the precipitation peak getting later towards the west, as is shown in Fig. 9c. The main cause leading to the difference of precipitation peak time between WRF9 and ERA5 over TP-C may be the representation of topographic features. There are more details about mountains and valleys in WRF9 than in ERA5. The barrier effects of high mountains and the channeling effect of valleys are better exhibited in WRF9 (Lin et al. 2018). As is shown in Fig. 15, the peak time of Q2 over TP-C in WRF9 occurs at 0000 LST compared 1600 LST in ERA5. It may be the rough representation of terrain in ERA5 that caused Q2 to reach the precipitable condition earlier than in WRF9, not only in magnitude but also in transport speed. Over TP-SN, there are little differences of precipitation peak time between WRF9 and ERA5, except for the magnitude of precipitation. A 20-year (2000-2019) dynamical downscaling simulation at the gray-zone resolution (9 km) using the WRF model driven by the ERA5 is conducted over the TP. The model's ability in reproducing the long-term regional climate is evaluated compared to ERA5 through the assessment of the simulated surface air temperature and precipitation against in-situ observations and IMERG satellite precipitation product. Also, possible reasons for model biases are discussed in this study. Firstly, WRF9 improves annual and seasonal climatological characteristics in terms of T2m, Tmax, Tmin, and precipitation against the driving data ERA5. Improvements include the reduced cold bias, especially for Tmax, and obviously reduced wet bias all year long. Secondly, the interannual and seasonal variations of near-surface temperature and precipitation are also improved in WRF9. More specifically, winter cold biases and warm season wet biases are more obviously reduced, especially over TP-C and TP-SE. Thirdly, WRF9 better reproduces the precipitation intensity and significantly improved the simulation of precipitation frequency. Finally, the peak time and diurnal precipitation amount and intensity features over TP are better reproduced in WRF9 due to a more realistic topographical representation of high mountains and valleys.

Conclusion
The improvements in WRF9 are thought to be more realistically simulated surface net energy balance, snow fraction, and water vapor content and transport due to better representation of topography. For near-surface temperature, more absorbed surface net energy heats the surface more Fig. 15 Time-latitude cross section of 20-year averaged diurnal precipitation from WRF9 (a) and ERA5 (b), averaged from 85° to 94° E (unit: g/ kg). The white dotted line shows the peak time of Q2 transporting to TP-C intensively than ERA5, leading to higher T2m which reduce the cold biases, especially over TP-SE. Less simulated snow fraction in WRF9 plays an important role in modulating the surface albedo, which also affects surface net energy and result in the improvement of surface air temperature simulation. In terms of improved precipitation features, less water vapor transport from the southern boundary of TP attributable to the better represented topographical features leads to lower specific humidity and water vapor content in the WRF9 simulation compared to ERA5, which is consistent with the difference in precipitation distribution between them, explaining the reduced wet bias in WRF9. The better represented topographical features such as high mountains and valleys and the finer resolution also resulted in the better simulated diurnal peak times of precipitation. The dataset produced by WRF9 still have some limitations. For example, near-surface temperature simulation in spring and winter is not satisfying enough compared to those in summer and autumn, especially for Tmin. Despite the improvement in precipitation frequency and intensity, WRF9 simulates slightly more precipitation events for precipitation intensities below 7.0 mm/day. Since the horizontal resolution used in this study is 9 km, which is still relatively sparse to represent the steep terrain in the south edge of TP, some features over there may not be fully represented. With finer resolution, the water vapor transport and the formation of precipitation are expected to be more realistically characterized. Moreover, due to the sparse distribution of in-situ stations in the western TP, it is difficult to fully reflect the added values of WRF9. Furthermore, there is only a single ensemble simulation in our study, multiple ensemble simulations with different parameterizations could be conducted in the follow-up work if computing resources are allowed.
On the basis of evaluation of WRF9's ability in simulating the annual mean, inter-annual, and seasonal variation, daily and diurnal characteristics of surface air temperature and precipitation, WRF9's added value is validated and understood more comprehensively. Promisingly, this will promote understanding of long-term climate conditions for better policy adaption over the TP and provide experience for future regional climate simulation at the gray-zone scale, even at CPM scale.