Advantage and Disadvantage of Global and Local Climate Datasets on Modeling Species Distribution at Continental and Landscape Scales

Species distribution model based on global and local climate datasets were hypothesized to have advantages on projecting distribution range at continental and landscape scales, respectively. Random Forest (RF) and principle components analysis (PCA) aimed to project potential distribution range and to construct climate space of Bretschneidera sinensis in continental East Asia (CEA) and northern Taiwan (NTWN) based on the WorldClim and local climate datasets. Geographical extent of the endangered species at continental scale was available to be projected by RF based on the WorldClim dataset, whereas RF had projected bias map that presented gridded squares at edges of the potential distribution range. At landscape scale, projection map of RF in NTWN based on the WorldClim dataset presented gridded distribution far from empirical distribution pattern, while that based on local climate dataset presented a distribution pattern relevant to elevation and topography. PCA had revealed climate differentiation between continental and island populations. Evidently, local climate dataset had reected climate heterogeneity at landscape scale and is essential for identifying local adaptation of island population at geographical margin of the endangered species. However, huge number of gridded cells generated from local climate interpolation method for projecting potential distribution range at landscape scale is not available to expand geographical extent to continental region. Global climate dataset has the advantage on modeling geographical extent of plant species at continental scale, while local climate dataset used for modelling species distribution enables conservationists to delineate reliable conservation areas in fragmented natural habitats at landscape scale.

However, downscaled climate dataset from the WorldClim was presumably not ner enough to re ect local climatic heterogeneity in mountainous areas, since local climatic heterogeneity was drastically induced by altitudinal and topographical gradients (Chao et al. 2010;Chen et al. 1997; Ma et al. 2010;McKenzie et al. 2003; Tu et al. 2010). Particularly, distributions of rare species with locally restrict distribution are often critically affected by certain local elements, such as the occurrences of favorable microsites and microclimates (Elith & Leathwick 2009;Heikkinen et al. 2012). Such local special ecological requirements can be di cult to distinguish in the broader-scale climate dataset (Guisan et al. 2007). Alternatively, few SDM studies had used high-resolution environmental data to predict species distribution range in mountainous areas (Lannuzel et al. 2021; Raes et al. 2009; Tomlinson et al. 2020), whereas these models used topographical elements that was not directly related to ecological processes constrained species distribution (Tomlinson et al. 2020). Therefore, uncertainties still exist in the correlation of species distribution with climate factors when the downscaled climate data from the WolrdClim dataset or topographical elements had been applied to model species distribution at ne scale (Lannuzel et al. 2021;Tomlinson et al. 2020). In order to model species distribution range and correlate species distribution with climate factors at landscape scale, a previous study had proposed an interpolation method to generate gridded climate dataset with high spatial resolution of 50 × 50 m 2 from daily data of local meteorological stations (Liao & Chen 2021). The previously proposed method was presumably appropriate for modeling distribution of plant species in relation to climatic heterogeneity induce by elevation and topography and to address climate factors driving species distribution in mountainous area. However, high-resolution local climate dataset generated huge number of gridded cells within a local geographical extent and, thus, it is questionable that the method is available to expand the geographical extent of gridded climate dataset to continental region.
Species distribution models (SDMs) can provide useful predictions for insu ciently surveyed areas and provide guidelines for conservation planning tasks, such as identifying suitable habitats and seeking new populations for rare species (Fois et al. 2015;Mi et al. 2017). In the last century, economic development, urbanization, and population growth led to a highly modi cation of natural ecosystems, while habitat of the plant species were fragmented and currently formed by complex mosaics of rural or urban areas, abandoned grasslands, agricultural landscapes, and secondary forests (Dong et Wang et al. 2018). Populations of the species are quiet small and have suffered from isolation and fragmentation of habitats (Dong et al. 2019). Distribution patterns of the endangered plant species in continental region and local area were projected by RF based on the WorldClim and local climate datasets to evaluate geographical extent and locally restricted distribution range and were expected to provide valuable information for effective conservation.
The objectives of this study were to identify the advantage and disadvantage of the global and local climate datasets at continental and landscape scales. Two hypotheses were assessed in this study. (1) The WorldClim dataset was hypothesized to have advantage on projecting geographical distribution range and quantifying climate space of plant species at continental region, whereas it was presumably not appropriate for projecting species distribution at landscape scale. (2) Local climate dataset was presumably available to capture climatic heterogeneity induced by elevation and topography in mountainous areas and has the advantage on projecting species distribution range and quantifying climate space at landscape scale. Model prediction based on local climate dataset was further expected to capture ne-scale ecological characteristics and to provide valuable geographical information for developing effective conservation management of the endangered species.

Plant species and collections
Bretschneidera sinensis Hemsl., a deciduous tree species of montotypic genera belongs to the family Akaniaceae, is a relic species of the Tertiary tropical ora mainly occurring in evergreen broad-leaved or mixed evergreen and deciduous forests in mountainous areas at altitudes of 300-1,700 m above sea level Georeferenced occurrences of the B. sinensis in continental East Asia (CEA) was downloaded from the Global Biodiversity Information Facility (GBIF). A total of 158 georeferenced data records in CEA were obtained from the GBIF (Fig. 1). Presence data of B. sinensis in northern Taiwan (NTWN) was downloaded from the herbarium of the Taiwan Forestry Research Institute, herbarium of National Taiwan University, and herbarium of the Academia Sinica, Taipei. A total of 72 data records were collected for NTWN ( Fig. 1).

Climate datasets and variables selection
The WorldClim dataset (available at https://www.worldclim.org/) with 30-sec spatial resolution (approximately 1 km in equator) is one of the most commonly used climate dataset in modeling species distribution (Fick & Hijmans 2017) and the dataset was applied to predict current potential distribution range of B. sinensis in CEA. The CEA had been divided into gridded cells with spatial resolution at 30 minutes (approximately 50 km) to capture climate variables from the WorldClim dataset. A total of 1.5 million gridded cells was generated in CEA including Mongolia, China, India and Indo China and was applied for model prediction. Projection map in CEA based on the WorldClim dataset were presented to show the distribution pattern of the B. sinensis at continental scale. In this study, 13 climate variables, 9 temperature and 4 precipitation variables, were adopted for constructing the RF model (Table 1). Table 1 Bioclimatic predictors used for modeling potential distribution range of Bretschneidera sinensisHemsl.

Results
Potential distribution range of B. sinensis Distribution maps of B. sinensis in CEA were projected by RF based on the construction of presence and absence data extracted climate variables from the WorldClim dataset (Fig. 2a). The AUC (area under operating characteristic curves) value of the model was 0.995. Distribution map in CEA projected by RF presented suitable habitats of the plant species in Southern China. The model had effectively presented potential distribution range of the plant species at continental scale. However, projection map based on the WorldClim dataset presented no isolation and fragmentation of potential habitats at Southern China that is contradict to the empirical distribution pattern of the endangered species. Furthermore, projection map presented a pattern of gridded squares at the edge of the potential distribution range in a larger scale map (Fig. 2b and 2c). Thus, projection map based on the WorldClim dataset may not re ect the empirical distribution pattern of the plant species and there were uncertainties of the distribution pattern based on the WorldClim dataset at continental scale.
Potential distribution range of the plant species were further examined in NTWN and the map projected by RF based on the WorldClim and local climate datasets had presented completely different distribution patterns at landscape scale ( Fig. 3a and 3b). Distribution map of B. sinensis based on the WorldClim dataset was irrelevant to the topography and elevation, since that had presented gridded distribution pattern around the presence data records (Fig. 3a). Gridded distribution pattern in mountainous areas of NTWN projected by RF model was similar to the edge of the potential distribution range of the species in CEA and it was an unrealistic distribution pattern. In contrast, distribution map of B. sinensis based on local climate dataset was pertinent to topography and elevation (Fig. 3b), that is more likely to present empirical distribution pattern at landscape scale.
The error rates (ERR OOB ) of the two projection results in NTWN presented no signi cant differences by statistical test (0.9177 for the WorldClim and 0.9103 for local climate datasets). Moreover, the climate variables contributed most to the model performances were different between the two climate datasets (Fig. 4). Temperature (Bio2) derived from the WorldClim dataset was the most important factor affecting model performance, while water availability (Bio19) contributed most to the model performance based on the local climate dataset.
Quanti cation of climate spaces by PCA PCA were conducted to identify climate characteristics of the CEA and NTWN and to depict climate space of the plant species. Principle component 1 (PC1) accounted for 97.28% of the variation, while principle component 2 (PC2) 2.34% of the variation (Fig. 5). PC1, PC2, and principle component 3 (PC3) were strongly related to Bio12, Bio19 and Bio18 variables, respectively (Table 2). Apparently, these three variable were all water availability that had evidently played as the most important role for the quanti cation of climate spaces of B. sinensis. Climate environment in NTWN based on local climate dataset had wider range along PC1 than that based on the WorldClim dataset presented by PCA (Fig. 5). B. sinensis occupies three different climate spaces based on the WorldClim and local climate datasets represented by dark circles in the PCA diagram ( Fig. 5). Particularly, the plant species occupies distinct climate spaces in NTWN based on the WroldClim and local climate datasets (Fig. 5). It is interesting that the same geographical coordination locations of the plant species presented distinct climate spaces based on the different climate datasets. It is necessary to verify the empirical distribution pattern of the plant species in the eld based on the expert knowledge to identify which distribution pattern projected by RF and which climate space constructed by PCA were more close to empirical distribution pattern at landscape scale. Furthermore, statistical tests were performed to detect which variable was the most responsible for the different climate spaces in the PCA diagram. Surprisingly, results of analysis of variance (ANOVA) and Tukey's test presented that all the variables have signi cant differences among the three climate spaces of the plant species (Table 3).  (Fig. 6), since the CEA had broad latitude range. However, annual precipitation in NTWN based on local climate dataset had extraordinary broad range in contrast to that in CEA and NTWN based on the WorldClim dataset (Fig. 6). Wide range of NTWN's climate environment along the PC1 was caused by extraordinary high annual precipitation that was partly a consequence of high winter precipitation at coastal area and lower at inland area. Local climate dataset generated from daily data of meteorological stations was more likely to capture the variation of local climate characteristics and was available to re ect climate heterogeneity in NTWN at landscape scale that was not available to be exhibited by the WorldClim dataset.

Discussion
The To avoid misleading result caused by the WorldClim dataset, local climate dataset was suggested to apply as bioclimatic predictors of model prediction at landscape scale. The projection maps of the plant species in NTWN based on the WorldClim and local climate datasets were veri ed in the led based on expert knowledge and eld examination had identi ed that projection map based on local climate dataset was more close to empirical distribution pattern of the plant species. Accordingly, model performance was much better when model was calibrated by local climate dataset at landscape scale. Bioclimatic predictors from local climate dataset had precisely re ected climate characteristics induced by elevation and topography (Fig. 7) and were suggested to use for model projection to guarantee the accuracy of SDMs performance in mountainous area. Model projections at landscape scale is the advantage of local On the other hand, previous studies had proposed that RF model with low error rate (ERR OOB ) were considered to have more accurate performance. Our study had con rmed that low error rate (ERR OOB ) can be achieved even when model did not project accurate distribution range and projection map was         Characteristics of bioclimatic predictors from the WorldClim (left column) and local climate datasets (right column). Bioclimatic predictors from the WorldClim presented signi cant gridded squares that were not available to reveal climate heterogeneities induced by elevation and topography, while that from local climate dataset better re ect elevation and topographical features. The rst row is annual temperature