Summer regional climate simulations over Tibetan Plateau: from gray zone to convection permitting scale

The Tibetan Plateau (TP) is often referred to as ‘the Third Pole’ and plays an essential role in the global climate. However, it remains challenging for most global and regional models to realistically simulate the characteristics of climate over the TP. In this study, two Weather Research and Forecasting model (WRF) experiments using spectral nudging with gray-zone (GZ9) and convection-permitting (CP3) resolution are conducted for summers from 2009 to 2018. The surface air temperature (T2m) and precipitation from the two simulations and the global reanalysis ERA5 are evaluated against in-situ observations. The results show that ERA5 has a general cold bias over southern TP, especially in maximum T2m (Tmax), and wet bias over whole TP. Both experiments can successfully capture the spatial pattern and daily variation of T2m and precipitation, though cold bias for temperature and dry bias for precipitation exist especially over the regions south of 35° N. Compared with ERA5, the added value of the two WRF experiments is mainly reflected in the reduced cold bias especially for Tmax with more improvement found in CP3 and the reduced wet bias. However, the ability of the convection-permitting WRF experiment in improving the simulation of precipitation seems limited when compared to the gray-zone WRF experiment, which may be related to the biases in physical parameterization and lack of representativeness of station observation. Further investigation into surface radiation budget reveals that the underestimation of net shortwave radiation contributes a lot to the cold bias of T2m over the southeastern TP in GZ9 which is improved in CP3. Compared with GZ9, CP3 shows that larger specific humidity at low-level (mid-high level) coexists with more precipitation (clouds) over the southern TP. This improvement is achieved by better depiction of topographic details, underlying surface and atmospheric processes, land–atmosphere interactions and so on, leading to stronger northward water vapor transport (WVT) in CP3, providing more water vapor for precipitation at surface and much wetter condition in the mid-high level.


Introduction
The Tibetan Plateau (TP) extends over the area of 27°-45° N, 70°-105° E, covering a region about a quarter of the size of the Chinese territory . Surrounded by the Earth's highest mountains, such as the Himalayas, Pamir, Kunlun Shan and others, it is the highest and most extensive plateau in the world (Kang et al. 2010;Xu et al. 2018;Gu et al. 2020) and has long been known as the roof of the world (Liu and Chen 2000;Qiu 2008;Yao et al. 2019). Mountains in the TP have a strong impact on precipitation distribution, and a knowledge of the characteristics of precipitation is a basic and important requirement for the planning and management of water resources in East and South Asia (Xu et al. 2008). Meanwhile, in the summer season, the TP serves as a huge heat source (Zhu et al. 2017), transferring heat from the surface to the air in the form of sensible and latent heating, and effective radiation (Yeh 1957), especially over the central and eastern regions (Duan et al. 2012), and plays an important role in the onset and maintenance of the Asian summer monsoon . However, the pattern and variations of summer precipitation across the Tibetan Plateau are poorly known compared with many other mountain areas in the world.
The complex climate over the TP is a key component of the regional and global climate system, but the lack of basic observation data makes it difficult to assess the impact that TP has on climate change across scales. Facing the fact that only sparse observation is available from heterogeneously distributed meteorological stations over the TP (Kuang and Jiao 2016;Maussion et al. 2011;Xiao et al. 2016;Li et al. 2018), numerical simulation results have been proven to be a reasonable and reliable complement to enhance the understanding of climate over the TP. Compared to the global climate models (GCMs), higher-resolution regional climate models (RCMs) are able to depict regional heterogeneity and lead to a better understanding of regional to local climate change signals. The main achievements in RCM research are benefited from the increase of simulation length and resolutions (Giorgi 2019). Several studies have shown that added value is obtained by increasing the horizontal resolution of RCMs to capture additional fine-scale weather processes (Jacob et al. 2014;Di Luca et al. 2011;Lucas-Picher et al. 2012). With its complex orography, the TP is very sensitive to the horizontal resolution of RCMs (Gao et al. 2015a(Gao et al. , b, 2017b. Gao et al. (2018) found that the WRF model with a resolution of about 30 km shows reduced overestimation for extreme precipitation frequency, increased spatial pattern correlations of simulated precipitation, and more accurate linear trends of precipitation compared with coarser resolution GCMs and reanalysis over the TP. Xu et al. (2018) showed that the added value of RCM simulation of about 25 km resolution is achieved by affecting the regional air circulation near the ground surface around the edge of the TP, which leads to a redistribution of the transport of atmospheric water vapor.
Convection is considered to be one of the most critical physical processes affecting the occurrence and amount of precipitation (Kukulies et al. 2020;Niu et al. 2020), while cumulus parameterization schemes (CPSs) have been considered to be a primary uncertainty source in precipitation simulations over the TP for coarse resolution (~ 25 km) RCMs (Wang et al. 2021b). Attempts have been made to solve the difficulty by further increasing the resolution of RCMs. The gridding space around 10 km is the so-called gray-zone at which resolution the individual convection cells cannot be resolved, but the organized mesoscale convective systems can be explicitly represented (Ou et al. 2020).
And when the resolution is reduced to less than 4 km, it's widely referred to as convection-permitting scale. With the gray-zone grid spacing, a CPS may or may not be turned on. In Asia, Chen et al. (2018) found that the WRF at the 9 km gray-zone resolution without the use of CPS captures the salient features of the Indian summer monsoon as well as the spatial distributions and temporal evolutions of monsoon rainfall. Taraphdar et al. (2021) evaluated the WRF at the 9 km gray-zone resolution over the United Arab Emirates (UAE) and the Middle-East, and found that gray-zone simulations' performance for the synoptic and meso-scale precipitation are comparable to convection-permitting simulations with optimized model physical packages. Ou et al. (2020), based on WRF simulations at gray-zone resolution with different CPSs and a simulation without CPS over the TP, found that the frequencies and initiation timings for short-duration (1-3 h) and long-duration (> 6 h) precipitation events are well captured by the experiment without CPS concerning the precipitation diurnal cycles.
Future directions in RCM research are discussed by Giorgi (2019), with a highlight on the transition to convection-permitting modeling systems. Benefited from the rapid development of high-performance computing resources, convection-permitting models (CPMs) are becoming affordable for climate study, which could explicitly resolve the deep convection (Liang 2004;Dai 2006;Prein et al. 2015;Zhang and Chen 2016), eliminate the biases resulted from the application of CPSs, and narrow the uncertainty from model physics (Weisman et al. 1997;Miura 2007;Schlemmer et al. 2011;Satoh et al. 2014;Ban et al. 2015), especially over regions with prevailing convective activities. The added values, such as improved simulations of the buildup and melting of snowpack Liou et al. 2013) as well as improvements of temperature at a height of 2 m related to improved representation of orography (Hohenegger et al. 2008;Prein et al. 2013), have been found in CPM climate simulations. Many studies have also demonstrated the other benefits of using CPMs, including the ability to capture observed precipitation diurnal cycles over subtropics Guo et al. 2019Guo et al. , 2020Li et al. 2020;Yun et al. 2020), well replicating the spatial distribution of precipitation over complex terrain (Grell et al. 2000;Prein et al. 2013;Rasmussen et al. 2014;Gao et al. 2020), and even capable of representing the spatial-temporal scales and the organization of tropical convection at the nearly global scale (Schiwitalla et al. 2020). However, it is also important to mention that CPM climate simulations are not the cure for all model biases. The largest added value can be found on small spatial and temporal scales (< 100 km and subdaily) or in regions with steep orography (Prein et al. 2015) such as TP. Li et al. (2021) demonstrated that CPM is a promising tool for dynamic downscaling over the TP with its higher ability to depict the precipitation frequency and intensity. Zhou et al. (2021) found that CPM outperforms the High Asia Refined regional reanalysis (HAR v2, Wang et al. 2020) for 10-m wind speed and precipitation with obviously reduced wet bias over the TP. Furthermore, process-based analysis methods can reveal deeper insights into the more physically and dynamically consistent atmospheric phenomena in CPM climate simulations. Lin et al. (2018) showed that simulation with finer resolutions (especially 2 km) can diminish the positive precipitation bias over the TP through decreased water vapor transport which is reflected mostly in the weakened wind speed. However, modeling clouds remains a challenge even with CPMs that still require several parameterizations (shallow convection, microphysics cumulus schemes) that need to be adapted for finer resolutions (Kendon et al. 2021). Thus, to date, one of the main challenges associated with the use of CPMs lies in their heavy computational requirements and demanding output storage sizes (Schär et al. 2020). Another challenge lies in the lack of reliable high temporal and spatial resolution gridded observations, affecting the evaluation of the CPM simulations, and especially the assessment of their added value, often linked with sub-daily time scales and extremes. The above challenges limit the characterizations of the different sources of CPM uncertainties and hamper their uptake in climate change assessments and impact studies (Lucas-Picher et al. 2021;Prein et al. 2017Prein et al. , 2020. Both gray-zone and CPM simulations are at their earlier stages in regional climate application, and there are still few studies in intercomparing simulations at gray-zone scale and convection-permitting scale especially over the TP during the past years. A gap exists in understanding the added value from gray-zone to convection-permitting scale, in which a significant increase in computational resources is needed. Furthermore, previous studies with CPM over the TP were mostly limited to short-term simulation. In this study, two types of high-resolution experiments using the WRF model, the gray-zone resolution (GZ) of 9 km with no CPS and the convection-permitting (CP) resolution of 3 km, are performed over the TP for the summer of 2009 to 2018. By comparing the two sets of simulation results, we can: (1) evaluate the model's performance with various resolutions in reproducing the spatiotemporal characteristics of surface summer climate over the TP; (2) identify the added value of convection-permitting simulation over complex terrain; and (3) isolate the contribution of the convection-permitting experiments in improving the simulation of regional climate processes.
The article is organized as follows. Section 2 describes the model and experimental design, data and methodology. Section 3 presents the main results as well as the comparison with the observations, including the added value of CPM simulation. Section 4 discusses the possible reasons for explaining the excessive precipitation and higher 2-m air temperature in CPM simulation. Finally, major conclusions are presented in Sect. 5.

Model and experimental design
The WRF model version 4.1.1 (Skamarock et al. 2019) used in this study is a nonhydrostatic mesoscale numerical weather prediction system, which is designed to serve both operational forecasting and atmospheric research needs. The WRF model has been widely used for CPM regional climate simulations over Europe (Warrach-Sagi et al. 2013), North America (Gao et al. 2017a;Liu et al. 2017;Sun et al. 2016), Eastern China Yun et al. 2020) and over the TP Gao et al. 2020). The simulation domain in this study is centered at 33° N and 88.5° E, with 1081 (361) grid points in east-west directions and 721 (241) grid points in north-south directions for 3 km (9 km) resolution, covering the whole TP and the surrounding areas ( Fig. 1). Fifty hybrid-sigma levels are defined from surface to model top at 50 hPa. The horizontal resolution is set to 9 km for the gray-zone scale simulation (GZ9) and 3 km for the CP scale simulation (CP3) over the TP.
The physical parameterization schemes employed in this research include the Thompson microphysics scheme (Thompson et al. 2008), the Mellor-Yamada Nakanishi Niino 2.5 level TKE scheme (MYNN) planetary boundary layer (PBL) parameterization (Nakanishi and Niino 2006), the RRTMG shortwave and longwave radiation schemes (Iacono et al. 2008), and the Noah-MP land surface model (Niu et al. 2011). In the two experiments, the CPS is switched off.
Spectral nudging (hereinafter SN, von Storch et al. 2000), which is mostly used in models driven by global analysis Fig. 1 The simulation domain (yellow shading) with the TP framed with red lines, and the locations of the meteorological stations over the TP marked with blue dots (Tang et al. 2010(Tang et al. , 2017, has been adopted in CP3 and GZ9. In this study, the nudging wavenumber of 4 is employed in both directions and the nudging coefficient is 3 × 10 −4 . Meanwhile, the SN approach is only applied to wind fields above the PBL to allow the development of the mesoscale circulation. Huang et al. (2021) found that model simulations show clear improvements in their representations of downscaled precipitation intensity and its diurnal variations, atmospheric temperature, and water vapor when spectral nudging is applied towards the horizontal wind and geopotential height.
All the experiments are driven by the fifth generation Global Reanalysis data (ERA5) with the temporal resolution of 3-h and spatial resolution of 30 km from Europe Centre for Medium-Range Weather Forecasts (ECMWF) (Hersbach et al. 2020) and conducted during the summer season (June, July, and August) from 2009 to 2018. The simulation starts from May 16 and integrates continuously to September 1, with the first 16 days (May 16-31) as the spin-up time.

Observation data and method
To evaluate the performance of the WRF model in simulating the surface summer climate over the TP, the daily in-situ observation provided by the data service center at China Meteorology Administration (CMA) is used. Only 144 stations over the TP are applied in this study ( Fig. 1), which have comparatively more applicable observations of daily surface air temperature (T2m), maximum/minimum surface air temperature (Tmax and Tmin), and precipitation. Most of the meteorological stations are located in the central and eastern part of the TP while few of them are located over the western TP. Therefore, the data from the meteorological stations are not sufficient to fully depict the climate characteristics over the whole TP, especially over the western TP and regions above 4800 ASL. This is especially true for precipitation which has strong heterogeneity over the TP .
To validate the WRF-simulated precipitation over the TP more comprehensively, the satellite precipitation product, the Integrated Multi-satellite Retrievals for GPM (IMERG), is also used to carry out a more objective evaluation. IMERG uses inter-calibrated estimates from the international constellation of precipitation-relevant satellites and other data, including monthly surface precipitation gauge analyses, to compute half hour, 0.1° × 0.1° gridded datasets over 60° N-S (and partially outside of that latitude band) (Huffman et al. 2020). IMERG enables a wide range of applications, ranging from studies on precipitation characteristics to applications in hydrology to evaluation of weather and climate models (Tan et al. 2017). Several studies have already been carried out focusing on this new satellite rainfall product, and its satisfying performance is confirmed in India (Prakash et al. 2016), mainland China (Tang et al. 2016) and regions with complex terrain such as TP (Xu et al. 2017). Li et al. (2022) also highlight the superiority of IMERG over the TP in capturing the spatial distribution, magnitudes, and annual cycle of the amount and frequency of precipitation in different phases (rain, snow, and sleet) based on a 20-year analysis.
In addition, the ERA5 reanalysis dataset is also included in the evaluation of WRF experiments in this study to illustrate the added value of the WRF experiments against ERA5. ERA5 is the fifth generation ECMWF reanalysis for global climate and weather, which combines vast amounts of historical observations into global estimates using advanced modelling and data assimilation systems, and is the latest climate reanalysis produced by ECMWF. It provides 3-hourly and monthly data of various atmospheric, land-surface and oceanic climate variables, and includes information about uncertainties for all the variables at reduced spatial and temporal resolutions. The data covers the Earth on a 30 km grid spacing and resolves the atmosphere using 137 levels from the surface up to a height of 80 km. Chen and Ji (2019) evaluated the performance of ERA5 over the TP during the period of 1979-2012 and found that ERA5 well reproduces the temporal and spatial variations of surface air temperature and demonstrates overestimation of precipitation in wet season with an average bias of 1.0 mm/day.
The CERES (Clouds and the Earth's Radiant Energy System) (Wielicki et al. 1996) SYN (Synoptic Radiation Fluxes and Clouds) products (Doelling et al. 2013;Rutan et al. 2015) (hereinafter CERES-SYN) provides a global dataset of radiant fluxes at the surface, top of atmosphere (TOA), and in various atmosphere layers as well as related surface variables, with the spatial resolution of 1 • and the temporal resolution of 1-hourly, 3-hourly, and daily, etc. This dataset is based on measurements from instruments on board the NASA satellites Terra and Aqua which are polar-orbiting satellites and is designed for use in studies of climate and the global or regional surface energy budget. Wang et al. (2021a) compared various surface shortwave and longwave radiation products over the three poles and concluded that CERES-SYN has the relatively best accuracy in the Qinghai-Tibet Plateau region. Therefore, the records of daily cloud, surface radiation and snow coverage from CERES-SYN are used for evaluation in this study.
Several statistics are calculated to quantify the accuracy of the WRF simulations, including the correlation coefficient, the uncentered root-mean-square error (RMSE), and the relative bias (RB). The correlation coefficient is used to describe the temporal and spatial similarity between the observations and the simulations. The RMSE can measure the average magnitude of the deviation of a model simulation from the observation, with mean error, correlation coefficient, and standard deviation considered (Taylor 2001). In addition, water vapor transport (WVT) in the model is examined to reveal related physical processes which are supposed to be better represented with finer spatial resolution. WVT at each level is calculated using the following formulas: with To obtain the total column of WVT, WVT is additionally integrated along the metric z coordinate from surface to the top of the σ levels using the rectangle method as follows: where is the air density ( kg m −3 ), r the mixing ratio for water vapor ( kg kg −1 ), v h the horizontal wind vector ( m s −1 ), p the pressure (pa), R d the gas constant, and T the air temperature (K), z the thickness of each σ level. Detailed description can be found by referring Curio et al. (2015) or Lin et al. (2018).

Results from WRF simulations at gray-zone and convection-permitting scale
Evaluation of the WRF experiments (CP3 and GZ9) is mainly for the surface summer climate of T2m, Tmax, Tmin, and precipitation. In order to compare with the in-situ observations, the WRF simulation results were interpolated onto the stations when compared with in-situ observations, using the inverse distance weight interpolation method. Due to the elevation difference between WRF grids and station location, lapse rate (LR) is used to bias correct the WRF simulated T2m, Tmax and Tmin as well as that from ERA5 when evaluating them against in-situ observations (Gao et al. 2015b;Wang et al. 2018;Du et al. 2007;Kattel et al. 2012). According to the spatiotemporal variability of LR proposed by Wang et al. (2018), the mean LRs over the western TP, northeastern TP, and southeastern TP during summer are − 4.90, − 4.53, and − 4.03 K/km, respectively, which are consistently lower than the commonly used global mean LR (− 6.5 K/km) and are used for bias correction in this study. Figure 2 shows the 10-year averaged (2009-2018) summer mean daily T2m, Tmax, and Tmin from the in-situ observations, the differences between the WRF simulations and

Summer mean surface air temperature and precipitation
the observations as well as the differences between ERA5 and the observations. The observed T2m decreases from the southeastern TP to the central TP, with the maximum T2m at about 22 °C over the eastern TP while the minimum T2m below 8 °C over the central TP. Both the WRF experiments can realistically reproduce the spatial pattern of T2m with the spatial correlation coefficients (SCCs) larger than 0.94, but underestimate the T2m over the regions south of 35°N especially in GZ9. Compared to GZ9, CP3 clearly improves the T2m simulation with lower cold biases over the regions south of 35°N. The simulated distributions of Tmax and Tmin also agree well with the observations, with the SCCs above 0.90 and the RMSEs below 2.7 °C. ERA5 shows comparable performance with the WRF experiments in reproducing the spatial pattern of the 10-year summer mean T2m with SCCs above 0.95. All of the SCCs above are significant at the 0.01 significance level. However, CP3 tends to simulate higher T2m than GZ9 and ERA5 over the TP, thus showing more skillful performance in reducing the cold bias, especially for Tmax which demonstrates the most reduced RMSE compared with GZ9 and ERA5.
The daily temperature range (DTR) is higher over the regions north of 30° N according to the station observations, exceeding 13 °C (figure not shown), while the minimum DTR below 10 °C is detected over the southeastern TP. Both WRF experiments can well simulate the spatial pattern of DTR with the SCCs about 0.8 and the RMSEs less than 1.5 °C. The spatial pattern of DTR in ERA5 is similar to that of WRF experiments, with lower SCC of about 0.68 and larger RMSE of about 2.3 °C. Consistent with the underestimation of Tmax, obvious cold bias exists over the southern TP in WRF experiments and ERA5. However, WRF experiments can reduce the RMSE by about 1 °C over the southeastern TP compared with ERA5, showing the added value of WRF simulation at both gray-zone scale and convection-permitting scale in reducing the cold bias of Tmax, Tmin and DTR. Figure 3 shows the 10-year averaged summer mean precipitation from the station observations, the WRF simulations, and ERA5 as well as the bias of the WRF simulations and ERA5 against the observations. The observed precipitation decreases from southeast to northwest, with the maximum above 6 mm/day located at the southeastern corner of TP and the minimum less than 1 mm/day over the northeastern TP. CP3, GZ9 and ERA5 can realistically capture the spatial distribution of summer mean precipitation, with SCCs larger than 0.7 which are significant at the 0.01 significance level. However, the RMSEs of WRF simulations are about 1.5 mm/day lower than that of ERA5. Meanwhile, the WRF model clearly underestimates summer precipitation over most regions of TP, especially over the southern TP while ERA5 tends to greatly overestimate that with the most severe wet bias which is even larger than 3 mm/day occurring over the southeastern TP. In general, the added value of WRF experiments lies in the reduced bias over the southeastern TP while with the higher spatial resolution from GZ9 to CP3, the improvement of dry bias compared with station observation seems to be limited, which may be more related to the physic parameterization in model configuration as well as the lack of representativeness of station observation because they are mostly located at valleys.
The Taylor diagrams are also presented to evaluate the performance of two WRF experiments and ERA5 in simulating the spatial distributions of summer temperature and precipitation (Fig. 4) over the TP for each year (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). For T2m, CP3 slightly outperforms ERA5 but shows relatively more improvement than GZ9 at every single year. For Tmax, GZ9 and ERA5 demonstrates similar performance while CP3 outperforms them with higher SCCs and lower RMSEs at every single year. For Tmin, GZ9, CP3 and ERA5 all have similar performance in reproducing its spatial pattern. Different from T2m for which both WRF's and ERA5's performance shows less annual variability, their performance exhibits more variability in simulating each year's precipitation, especially for ERA5. For the precipitation, the two WRF experiments generally show comparable performances and outperform ERA5 with reduced RMSEs.
In general, compared with ERA5, the added value of WRF experiments is mainly reflected in the reduced cold bias of Tmax, with more improvement found in CP3. Meanwhile, CP3 and GZ9 demonstrate reasonable dry bias while Fig. 2 The 10-year averaged (2009-2018) summer mean T2m, Tmax, and Tmin from the in-situ observations (a-c), the biases in CP3 (d-f), the biases in GZ9 (g-i), and the biases in ERA5 (j-l) ERA5 shows much wetter bias for precipitation. The performance of CP3 in obviously improving the simulation of summer precipitation seems limited compared with GZ9, which may be partly related to the option of physic parameterization schemes when the CPS is switched off and partly related to lack of representativeness of station observation.

Daily surface temperature and precipitation
The 10-year averaged (2009-2018) daily variations of the regional mean (over the TP) T2m, Tmax, Tmin, and DTR from the in-situ observations, WRF experiments and ERA5 are shown in Fig. 5a-d. The observed T2m ranges from 12 to 16 °C throughout the summertime, with the maximum T2m in early and middle July. Both WRF experiments and ERA5 can well capture the daily variation of T2m with the temporal correlation coefficients (TCCs) higher than 0.95 and the RMSEs less than 1.1 °C. CP3 outperforms GZ9 and ERA5 by reducing the cold bias. With bias correction applied, ERA5 can perform as well as GZ9 in simulating the daily variation of T2m even though the cold bias is significantly reduced in GZ9 compared with ERA5 when no correction is done. The Tmax ranges from 18 to 24 °C based on the observation, and CP3 and GZ9 also well simulate the daily variation with the underestimation of about 1.0 °C for CP3 and 2.0 °C for GZ9, which is better than 3.5 °C for ERA5. For Tmin, CP3 can also reduce the RMSE by about 0.6 °C compared to GZ9 and about 1.5 °C compared to ERA5. The observed DTR varies between 9 °C and 15 °C, with the minimum DTR occurring in early July. Both WRF experiments reproduce the daily DTR variation with the TCCs larger than 0.91 and the RMSEs less than 1.2 °C, and colder bias occurs in late July and early August. CP3 tends to simulate the DTR closer to observation. To conclude, both WRF simulations show added value for characterizing the spatial pattern and daily variation of extreme temperature (Tmax, Tmin and DTR) with reduced cold bias, and the more improvement can be achieved with finer spatial resolution.
Regarding the daily variation of regional mean precipitation over the TP (Fig. 5e), CP3, GZ9 and ERA5 can reproduce the daily variation with the TCCs all about 0.9 and the RMSEs below 0.75 mm/day for WRF experiments and below 2 mm/day for ERA5 when compared with the station observations. It is obvious that ERA5 performs relatively better in capturing the daily variation of summer precipitation, but with severe wet bias that is greatly reduced in WRF simulations.
The spatial distributions of TCCs and RMSEs of T2m, Tmax, and Tmin at each observational station are shown in Figs. 6 and 7, respectively. The spatial patterns of TCCs of T2m and Tmax are quite similar in WRF experiments and ERA5, with high TCCs above 0.9 located over the northeastern TP and decreasing from north to south. Compared with GZ9, CP3 exhibits higher TCCs of T2m and Tmax, especially over the southern TP. The WRF model has relatively lower performance in simulating variation of Tmin than that of T2m, with the TCCs ranging from 0.5 to 0.8. ERA5 demonstrates slightly higher TCCs for T2m and Tmax than WRF experiments especially over the northern TP, and it produces larger TCCs for Tmin at almost all the stations over the TP. For RMSEs, both experiments and ERA5 show large RMSEs that are above 3.0 °C for Tmax over the southern TP, which is greatly reduced in WRF experiments and improves more with higher spatial resolution. With the highest resolution, CP3 can reduce RMSEs for T2m, Tmax and Tmin to a large extent. Therefore, it can be concluded that CP3 improves the simulation of T2m with lower RMSEs, especially over the southern TP.
The spatial distribution of TCCs, RMSEs and RBs of the simulated daily precipitation for WRF experiments and ERA5 against station observation is presented in Fig. 8. The two experiments show quite similar spatial patterns of TCCs and RMSEs. High TCCs exist over the eastern TP and low RMSEs are located over the central and northern TP. CP3 has slightly increased the TCCs by about 0.1 and reduced the RMSEs by about 0.5 mm/day over the southeastern TP. ERA5 shows higher TCCs that are above 0.7 than WRF experiments. ERA5 shows larger positive RBs at almost all stations over the TP with the largest RBs located over the southeastern TP while WRF experiments have reduced but negative RBs between -30% and 0 over the most regions of TP with the absolute values of RBs smaller in GZ9, indicating that finer resolution has limited ability in improving the simulation of summer precipitation.
In general, both CP3 and GZ9 show comparable and satisfying performance in reproducing the daily variation of T2m. WRF experiments outperform ERA5 especially in reducing the cold bias for extreme temperatures such as Tmax, Tmin and DTR, and the improvement increases with the higher resolution. Compared to both WRF simulations, ERA5 can better capture the daily variation of precipitation than WRF simulation, but with larger RBs. With finer resolution, the ability of WRF in reducing the dry bias is limited, which may be attributed to the biases in physical parameterization schemes.

Surface radiation balance and cloudiness
To investigate the causes of the cold bias of simulated T2m in the WRF, the surface energy budget including radiation and heat fluxes which have strong influence on the surface skin temperature are studied based on the reanalysis dataset, ERA5. According to Xu et al. (2015), the surface energy balance equation is written as follows: where σ , T s , SW ↓ , SW ↑ , LW ↓ , SH, LH and GHF represent the Stefan-Boltzmann constant, skin temperature, downward solar radiation, upward solar radiation, downward longwave radiation, sensible heat flux, latent heat flux and ground heat flux, respectively. Since no GHF records exist in ERA5 and it is generally very small from the aspect of long-term average, this term is reasonably ignored in the equation above. Figure 9 depicts the comparison of 10-year averaged (2009-2018) surface energy balance ( SW ↓ −SW ↑ +LW ↓ −SH − LH ) between GZ9 and ERA5, CP3 and ERA5, and also the difference between CP3 and GZ9, in order to reflect the spatial pattern of surface skin temperature, which is correlated with surface air temperature. Obviously, both WRF experiments tend to simulate more net shortwave radiation and downward longwave radiation (Fig. 9a, b) as well as more upward sensible and latent heat fluxes (Fig. 9d, e) than ERA5 especially over the northern TP. As a result, more energy is conserved at surface in WRF experiments over the northern TP, with positive deviation up to 20 W/m 2 , while less is simulated over the southeastern TP, with negative deviation below − 10 W/m 2 (Fig. 9g, h), which is in high consistency with the spatial pattern of T2m (Fig. 10). Whereas, CP3 still simulates much more SW↓ − SW↑ + LW↓ which offset the more SH + LH than GZ9 and finally leads to more net energy at surface and raises the T2m to some extent. The above comparison reveals that the spatial pattern of surface energy balance is strongly connected with that of T2m over the TP with complex terrain. However, lack of reliable observational surface radiation and heat fluxes products with suitable spatial and temporal resolutions, especially for sensible and latent heat fluxes, makes it difficult to draw this conclusion from other datasets derived from various sources. Therefore, it is expected that developing more reliable observational products over the TP will make this finding more robust and convinced.
In addition, extra attention has also been paid to the quantitative comparison between the two WRF experiments. The 10-year mean value of each term in the right-hand side of the equation is calculated and listed in Table 1. Over the TP and southern TP (25°-35° N, 80°-105° E), CP3 consistently simulates more net energy than GZ9, which is nearly 3 W/m 2 over the southern TP where the greatest difference of T2m is detected. Though CP3 receives slightly less downward longwave radiation than GZ9, it is compensated by the much more received net shortwave radiation which is nearly 10 W/m 2 larger in CP3. In addition, the slightly larger upward sensible heat flux in CP3 indicates that there exist more obvious temperature contrasts between surface and near-surface atmosphere in CP3. Meanwhile, the more simulated precipitation especially over the southern TP in CP3 (which is emphatically analyzed in the next section) is closely related to the larger upward latent heat flux in CP3. After taking all these complex and coupled physical processes into account, CP3 generally conserves more net energy which is mainly contributed by the net shortwave radiation and further warms the near surface. Currently, promoting the understanding of these related and coupled physical processes in the models to optimize the physical configurations becomes urgent to break the hurdle in reducing the cold bias existing in various models all the time. Fig. 6 The spatial distributions of TCCs for the simulated summer T2m, Tmax, and Tmin in CP3 (a-c), GZ9 (d-f) and ERA5 (g-i) as well as the difference of TCCs between CP3 and GZ9 for the simulated summer T2m, Tmax, and Tmin (j-l) Through the separate comparison of different factors affecting surface energy balance in CP3 and GZ9, it can be found that the reduced cold bias in CP3 comes mainly from the increase in net shortwave radiation. Thus, the cloud cover at 500-300 hPa, which is considered to be the low level of atmosphere over the TP and plays a crucial role in modulating the radiation budget, is further examined with CERES-SYN, which is believed to have the relatively best accuracy in the Qinghai-Tibet Plateau region, to tell the reason why there is more downward solar radiation in CP3, and the snow coverage is also examined to explain why there is less upward solar radiation in CP3. Figure 11 presents the 10-year averaged summer mean spatial distribution of cloud cover at 500-300 hPa from CERES-SYN, GZ9, CP3 and the difference between them. Though the spatial pattern of low-level cloud cover is well simulated by the WRF model with the SCCs above 0.85, an overall underestimation of low-level cloud cover is detected in both experiments, which is related to the underestimated downward solar radiation over 40 W∕m 2 compared with CERES-SYN (figure not shown). Even so, there are fewer low-level clouds in CP3 over the most regions of TP, reflecting less shortwave radiation back and thus receiving more energy. The daytime cloud cover at 500-300 hPa that plays a more crucial role in modulating the downward shortwave radiation is additionally examined (figure not shown) and results show that CP3 also simulates fewer clouds of about 8%, especially over the southeastern TP, which matches much well with the more downward shortwave radiation than GZ9. In addition, the 10-year averaged summer mean spatial distribution of snow coverage from CERES-SYN and WRF shows that there is more snow over the central TP and along the south slope of TP in the WRF (figure not shown), partly contributing to the cold bias of both experiments. However, improved  Fig. 6, but for RMSEs underestimation is found in CP3, reflecting back less shortwave radiation and raising the T2m to some degree.
Meanwhile, the comparison of cloud cover above 300 hPa which can be treated as mid-high level over the TP between WRF simulation and satellite-derived observation is also provided (Fig. 12). Obviously, there are consistently less clouds above 300 hPa but more clouds at 500-300 hPa in the observation and WRF experiments. The WRF model also successfully captures the spatial pattern of cloud cover above 300 hPa with the SCC of about 0.75 for GZ9 and 0.87 for CP3. Similar to the situation of low-level cloud cover, both model results underestimate that especially over the central TP and eastern TP with the underestimation above 10%, whereas more clouds are simulated by CP3 over the central TP and eastern TP, which is contrary to the results of low-level clouds. Even so, CP3 still simulates overall less total cloud cover, which is defined as the integration of all clouds from 50 hPa to surface, than GZ9 (figure not shown) which is consistent with the spatial pattern in the low-level and helps receiving more shortwave radiation and raising the T2m.

Water vapor
Considering that few stations are located at mountain peaks and ridges, the evaluation of precipitation above is limited to valleys as well. In order to get a comprehensive understanding of the simulation difference of precipitation between CP3 and GZ9, the simulated precipitation in both experiments are additionally compared with IMERG. Figure 13 provides the 10-year averaged summer precipitation of IMERG, GZ9, CP3 and the difference between them. The results show that summer precipitation mainly occurs over the southeastern TP and along the south slope of TP, which is consistent with the comparison against station observation. Generally, CP3 tends to simulate more precipitation of about 1 mm/day over Fig. 8 The spatial distribution of TCCs, RMSEs and RBs of precipitation in CP3 (the first row), GZ9 (the second row) and ERA5 (the third row). The difference of TCCs, RMSEs and RBs between CP3 and GZ9 is shown in the fourth row the southern TP than GZ9 but less precipitation over the northern TP, with more obviously reduced RMSEs when using IMERG as the benchmark, indicating that the choice of observation may also influence the evaluation of model's performance a lot. Meanwhile, larger specific humidity is also found at 500 hPa over the southern TP in CP3 than GZ9 (Fig. 14). In addition, larger specific humidity is also found in CP3 at 300 hPa and 200 hPa (Figs. 15,16), which is also related to more clouds above 300 hPa in CP3. The fact that, over the southern TP, more precipitation, larger specific humidity at low-level (mid-high level) and more clouds at low-level (mid-high level) co-exist, indicates that more water vapor is simulated in CP3 than in GZ9 and raises the question to be answered that why CP3 tends to produce more water vapor over the southern TP. Figure 17a, b show the topography of 26.5-29° N, 85.5-90° E (framed with black lines in Fig. 16b-d) in CP3 and GZ9 where the very complex terrain is of particular concern. The steep terrain here is generally considered to be a barrier for water vapor transport to the TP, however, there are many meridional canyons in this region that may function as vapor channels (Bookhagen and Burbank 2010). It is evident that GZ9 heavily smooths the topography while more detailed characterization of the steep terrain is represented in CP3. Moreover, the fact that CP3 outperforms GZ9 in realistically distinguishing pathways and barriers for vapor transport, reproducing the possible meridional canyons and depicting the channeling effect of valleys is supported by the validation that there is stronger WVT from the south and into the TP as is shown in Fig. 17c, d. It is obvious that the northward WVT is more active in CP3 and provides the necessary water vapor for the formation of precipitation and clouds. This improvement in CP3 definitely benefits from the better representation of topography and related near-surface physical processes, confirming the resolution dependency of WVT. Therefore, finer model resolution becomes critical to not only realistically represent more mesoscale Fig. 9 The difference of 10-year averaged radiative energy, non-radiative energy and the energy balance between CP3 and ERA5 (a, d, g), GZ9 and ERA5 (b, e, h), and the difference between CP3 and GZ9 (c, f, i). In the first row, received radiative energy (SW↓ − SW↑ + LW↓) is represented with positive values. In the second row, non-radiative energy loss (SH + LH + GHF) is represented with positive values. In the third row, net energy obtained at surface (SW↓ − SW↑ + LW↓ − S H − LH − GHF) is represented with positive values features but also simulate the physical processes such as WVT over the complex terrain like TP.
To conclude, a sufficient model resolution is proved to be beneficial to capture more terrain features over the southern TP while coarse resolution is challenged with more information loss. The topographical barrier and channeling effects, which are better simulated in CP3, are reflected in the more realistically simulated meridional canyons and stronger northward WVT over the southern TP. Consequently, in the low level, the more abundant water vapor is strongly influenced by the steep terrain and tends to produce more precipitation over the southern TP in CP3, which is also validated by the spatial pattern of precipitation as well as the specific humidity at 500 hPa. Whereas, in the mid-high level, the modulation effect of steep terrain weakens a lot and the surplus water vapor is able to spread farther over the whole TP, which favors the formation of more clouds in the most areas of TP. This deduction is also validated by the spatial pattern of specific humidity at 300 and 200 hPa, explaining more cloud cover above 300 hPa in CP3.

Conclusions
Two WRF experiments with the regional climate simulation schemes of spectral nudging at gray-zone and convectionpermitting resolution are performed over the TP from 2009 to 2018. The surface air temperature and the precipitation are evaluated based on the in-situ observations, and possible mechanisms are discussed.
Both WRF experiments successfully capture the spatial patterns and the daily variations of T2m, Tmax, and Tmin, with the SCCs and TCCs higher than 0.9. A generally cold bias is found, especially over the regions south of 35°N Fig. 10 The first row: 10-year averaged (2009-2018) summer mean T2m for ERA5, CP3 and GZ9; the second row: the difference of T2m between CP3 and ERA5, between GZ9 and ERA5, and between CP3 and GZ9 in reproducing the spatial pattern and daily variation of temperature, with greater underestimation and larger RMSEs than WRF experiments. Therefore, compared with ERA5, the added value of WRF simulation at both gray-zone scale and convection-permitting scale is mainly reflected in the reduction of cold bias, and more improvement can be achieved with finer spatial resolution especially over the southern TP. Further investigation into the surface radiation  Fig. 11, but for cloud cover above 300 hPa balance reveals that the surface energy balance is strongly related to the simulation of T2m, and the improved surface energy balance in CP3 contributes a lot to the reduced cold bias of T2m in CP3. Meanwhile, for CP3, the improved surface energy balance which mainly comes from the net solar radiation is caused by fewer low-level clouds and less snow coverage.
The spatial pattern and daily variation of summer precipitation are also reasonably reproduced in both WRF experiments, with the SCCs larger than 0.7 and the TCCs larger than 0.9. The WRF model clearly underestimates summer precipitation especially over the southern TP while ERA5 tends to greatly overestimate that when compared with station observation. Compared with ERA5, the added value of Fig. 13 The first row: 10-year averaged (2009-2018) summer mean precipitation for IMERG, CP3 and GZ9; the second row: the difference of precipitation between CP3 and IMERG, between GZ9 and IMERG, and between CP3 and GZ9 Fig. 14 The first row: 10-year averaged (2009-2018) summer mean specific humidity at 500 hPa for ERA5, CP3 and GZ9; the second row: the difference of the specific humidity at 500 hPa between CP3 and ERA5, between GZ9 and ERA5, and between CP3 and GZ9 WRF experiments is reflected in the reduced bias over the southeastern TP while with the higher spatial resolution, the improvement of dry bias compared with station observation seems to be limited, which may be related to the physic parameterization in model configuration as well as the limited representativeness of station locations. In CP3, larger specific humidity at low-level coexists with more precipitation and more low-level clouds especially over the southern TP. At the same time, larger specific humidity at mid-high level coexists with more mid-high clouds over the southern TP. Further, it is proved that CP3 outperforms GZ9 in characterizing more detailed terrain features and thus more realistically simulating the possible meridional canyons for WVT. The stronger WVT in CP3 provides sufficient water vapor for precipitation in the low-level, which is mostly limited over the southern TP under the influence of steep terrain, while spreads farther in the mid-high level, leading to the wetter condition over most areas of TP.  Based on the analysis of surface air temperature and precipitation over the TP from 2009 to 2018, WRF experiments at gray-zone and convection-permitting scales generally show comparable performance in successfully reproducing the spatial and temporal variation of multiyear summer climate. A higher horizontal resolution, therefore, has complex effects on the results of simulations. For example, our results show that even though CP3 can reduce the RMSEs for temperature, its ability in improving the simulation of precipitation is limited based on available observations with sufficient accuracy such as station observation. In addition, for the purpose of saving computation costs, experiments with gray-zone resolution are also a better choice especially for simulating longperiod climate evolution over the TP. Future studies in this area should include establishing more reliable high temporal and spatial resolution gridded observations to promote the evaluation of the convection-permitting simulations, Fig. 17 The first row: topography of the target region which is framed with black lines in Fig. 16; the second row: spatial pattern of total column water vapor transport (vector) and its v component (color) in CP3 and GZ9; the third row: the difference of the v component of total column water vapor transport between CP3 and GZ9 further looking for more advantages in experiments with convection-permitting resolution and making more improvements to experiments with gray-zone resolution to make them better utilized in the field of climate studies.