Urban heat island estimation from improved selection of urban and rural stations by DTW algorithm

Typical urban and rural temperature records are essential for the estimation and comparison of urban heat island effects in different regions, and one of the key issues is how to identify the typical urban and rural stations. This study tried to analyze the similarity of air temperature sequences by using dynamic time warping algorithm (DTW) to improve the selection of typical stations. We examined the similarity of temperature sequences of 20 stations in Beijing, and the results indicated that DTW algorithm could identify the difference of temperature sequence, and clearly divide them into different groups according to their probability distributions. DTW algorithm could create more rigorous and reasonable classification for typical urban and rural stations in Beijing by comparing with remote sensing and cluster analysis method. We also found that some traditional rural stations did not represent temperature variation in rural areas because their surrounding environments have been highly modified by urbanization process in last decades, and this might lead to an underestimate of the urban climate effect by 1.24℃. The study suggests that DTW algorithm is a simple and efficient method for selecting appropriate temperature records and analyzing temperature sequences with a good potential in improving urban heat island estimation.


Introduction
Urban heat island (UHI) is quantified by the temperature difference between urban and rural areas with temperature records collected from meteorological stations or satellite platforms (Camilloni and Barros 1997;Stewart and Oke 2012;Voogt and Oke 2003). A pair of stations in urban and rural regions are usually used to delineate the situation of surface temperature variation and estimate the magnitude of canopy urban heat island. The accuracy of estimated UHI intensity highly depends on whether an urban/rural station can accurately capture climate characteristics of urban and rural surfaces (Camilloni and Barros 1997;Stewart and Oke 2012;Voogt and Oke 2003). Urban/rural site identification from different data sources, such as population, night light images, and land use/cover datasets also affects UHI estimation (He et al. 2007; Mohsin and Gough 2012;Parker 2010), which makes UHI comparison across regions difficult (Oke 2006). Single pair of stations for UHI estimation may be reasonable for those cities not experiencing rapid urban expansion with relatively stable and a clear urban-rural border. However, urban development induced by city economy or population increase usually results in urban expansion and urban morphology change, and further modifies land properties of urban surroundings (Hu et al. 2015a, b;Jia et al. 2015;Normile 2008). The land cover transformations induced by the urbanization process can greatly alter the temperature records of stations at a peri-urban area, increasing uncertainties of local climate estimations, such as the possible underestimates of urban warming magnitude and 1 3 urban heat island intensity (Stewart and Oke 2012;Wu and Yang 2013;Yan et al. 2010).
It is important to identify the bias of UHI estimation induced by the difference of representative site selection (Mohsin and Gough 2012;Stewart and Oke 2012). With more and more field meteorological instruments being equipped at urban and its surroundings, averaged temperature records from multiple stations in typical land cover have already been used to quantify UHI (He et al. 2007;Kim and Baik 2005;Tan et al. 2010). The temperature from each meteorological station could only represent its limited footprint of surface energy variation (He et al. 2013). Therefore, the local climate zone was proposed as a research framework for urban climate studies, creating a comprehensive climatebased classification to define the local physical conditions of a field site (Stewart and Oke 2012). More physical explanations of UHI intensities could be derived from the comparison between the common surface and exposure characteristics of the selected field sites in specific cities. However, it is not convenient to compare UHI effects across regions due to the different derivations of UHI intensities. It is still a challenge to derive more representative UHI results by using air temperature data from a limited amount of observation sites.
Temperature time series provide a possibility for insights into climate trend variation, especially the evolution of urban climate. Land surface temperature from satellite images also provides a detailed spatial characterization of temperature variation in different spatial scales, which have been widely used in surface UHI estimation (Voogt and Oke 2003;Weng et al. 2004). They are highly correlated to air temperature variation and can estimate air temperature for those regions with sparse meteorological observations (Vancutsem et al. 2010;Zhu et al. 2013). Therefore, satellite-based land surface temperature (LST) is a good proxy for examining surface thermal conditions at local climate zone and evaluating whether appropriate observation sites are included in urban climate analysis. Air temperature and LST records have been used to quantify UHI in the urban canopy and urban surface, respectively (Hu et al. 2019). Averaged temperature is usually used to quantify UHI intensity to eliminate the possible uncertainties induced by surface heterogeneity of meteorological stations (Jin 2012). Standard classes defined by local climate zone have homogeneous surfaces with specific thermal characterization, which are useful for quantifying urban climate effects in different cities (Stewart and Oke 2012). Urban surfaces with consistent thermal characterization usually have similar temperature fluctuations, and their observation records are helpful in examining the urban thermal environment, surface energy budget, and UHI effect. Rapid urbanization has resulted in many meteorological stations suffering from the invasion of man-made infrastructures; then, the surface energy budget induced by the land cover change will be introduced to temperature records (Yan et al. 2010). Subsequently, the urban climate effect can be underestimated.
Dynamic Time Warping (DTW) is an algorithm for measuring similarity between two temporal sequences, and it has been already widely used in land use/cover mapping and cropland detection due to its advantages in classification capability for the reduced number of samples or time series from satellite images (Belgiu and Csillik 2018;Maus et al. 2016;Wei et al. 2020). The land cover classification accuracy by the DTW algorithm is higher than the MODIS land cover product with a global accuracy of 87.32% (Maus et al. 2016). Compared with other classification methods, such as the isometric feature mapping method and Euclidean distance method, the DTW algorithm often provided better results by considering the seasonality of crops and their temporal distortions in their alignment, especially time constraints for DTW further improving the classification accuracy (Csillik et al. 2019;Nduati et al. 2019;Wang et al. 2020;Zhai et al. 2020). Due to the classification advantages of the DTW algorithm, it has a potential in identifying the difference in the thermal environment in an urban or rural area by distinguishing their temporal similarity of temperature sequences from multiple meteorological sites.
Currently, an open-source R code has been developed to conduct the similarity analysis for land classification using remote sensing images (Maus et al. 2019). Meanwhile, air temperature change in an urban area with natural phenological variability can be flexibly handled by the DTW similarity measures to warp their temporal signatures, and may efficiently account for local climate zone differences and surface warming caused by natural and anthropogenic factors. Therefore, this study tried to use the DTW algorithm to improve the accuracy of UHI estimation by selecting typical urban and rural station combinations of 20 stations in Beijing. The algorithm was also used to verify air temperature time series for further identifying which one is the best combination of observation sites in estimating UHI intensity. Finally, we analyzed the possible improvement in UHI estimation by using the combination analysis from specific multiple-group temperature records.

Dynamic time warping algorithm
With its advance in automatic recognition and partial shape matching application for sequences of different lengths, the DTW algorithm has been widely applied to analyze the consistency of temporal sequences (Keogh and Pazzani 1999;Wang et al. 2013Wang et al. , 2020. We can construct an n × m matrix for temperature sequences of A and B with length n and m respectively: where the (i th , j th ) element of this matrix contains the Euclidean distance between the two points A i and B j .
To derive the optional distance, a warping path W can be conducted for sequences A and B.
In order to find the best match or alignment between these two sequences, we need to find a path in the distance matrix which minimizes the total distance between them.
The cumulative distance, (i, j) , is the minimum of the sum of the distances between the individual elements on the path divided by the sum of the weighting function, which can be deduced by the following recursive function:

Temperature sequence selection
Currently, an extensive observation station network has been established in Beijing to monitor the urban environment and urban climate (Table 1). These stations mainly distribute over city limits and surrounding urban areas with their elevation that ranged from 21 to 1225 m. Temperature records of 20 meteorological stations in Beijing were collected from China Meteorological Administrations in this study to create temperature sequences from 1987-2016 for the DTW algorithm ( Fig. 1). According to the Economic Census of Beijing, Beijing experienced rapid urbanization during the last decades with its urbanization rate greater than 85% in 2019. Dramatic urban expansion already modified local climate, and the surroundings of those stations invaded by urban land will cause the rise of air temperature records in addition to regional climate warming (He et al. 2013), especially in rural stations.

Similarity derivation by DTW
The DTW algorithm is programmed based on Python language to conduct the similarity analysis for temperature sequences. The results from our codes are consistent with the DTW-python package from https:// pypi. org/ proje   Table 1 station combinations for better estimating UHI intensities, and it provides an efficient way of understanding the relationship of different climate record series.

Discriminant analysis according to similarity data
We calculated all the DTW distances for 20 stations to create a matrix to examine their similarity of temperature series. Then, the cumulative probability distribution analysis for the distance information was used to analyze their agglomeration. The stations with higher agglomeration and lower DTW distance were classified as typical urban and rural sites, which were considered as the most appropriate sites to examine UHI intensity. A typical anchor for urban and rural stations could be identified from the station clusters according to their location. We removed those stations with worse similarity using the percentile thresholds from probability distribution analysis. Finally, the rest of the stations was further classified according to the similarity with anchor stations.

Validation analysis
In order to validate the discriminant results from the DTW algorithm, we compared them with those from the cluster analysis method and remote sensing images, further evaluating the similarity of UHI sequences from air temperature and land surface temperature to justify the classification results by DTW algorithm.

Compared with cluster analysis method
The cluster analysis method was performed to distinguish the closed station clusters due to its advantage in sorting data according to shared characteristics without the need for training data (Stearns et al. 2016). We created a matrix for temperature sequence data from each station with each column representing one station. The correlation method was selected for calculating the distance between temperature sequences. To compare with DTW results, 3-4 station clusters were selected to better examine the variation of the final partition according to the changes in similarity level. The hierarchical procedure created the groups for temperature sequences by calculating their similarity and distance values, and displayed a dendrogram for examining the grouping results at each step. The Minitab software was used in this study to finish clustering analysis for the final grouping of temperature sequences.

Comparison with remote sensing results
Urban and rural stations have been usually defined by city population and their location, and the land cover type derived from remote sensing images is another important source to distinguish urban and rural stations for examining urban climate (Hu et al. 2019;Wu and Yang 2013). The City Clustering Algorithm for National land use/cover dataset of China (NLCDC) from the 1980s to 2015 was used in this study to distinguish the urban and rural land (Hu et al. 2019;Peng et al. 2012). According to the criterion that the land cover type of 8-neighbors of the given pixel is urban land, pure urban pixels from land use datasets were identified and aggregated. Urban fringe could be identified using the moving window analysis to examine whether half of each window belongs to urban land. Finally, urban surroundings, including large areas of cropland, woodland, grassland, and water, were considered as rural surfaces. Finally, we identified station types according to their location in the reclassified land use/cover map.

UHI similarity from air temperature and land surface temperature
To validate the station classification results by the DTW algorithm, we investigated the similarity of UHI sequences from air temperature and satellite-based surface temperature. Theoretically, stable urban/rural stations have similar air temperature fluctuations or warming signals induced by climate background (He et al. 2013), which may cause lower DTW distances of a pair of stations. The composite land surface temperature products (MYD11A2 and MOD11A2) from 2003 to 2016 were selected to estimate UHI intensity. Derived surface temperature from 4 times observation by Terra/Aqua-MODIS (local time 10:30 AM and 22:30 PM from Terra, 13:30 PM and 01:30 AM from Aqua) illustrated surface radiation energy variation influenced by the daily insolation, which was correlated to daily mean air temperature observed in meteorological station. UHI sequences could be derived from the land surface temperature difference between the urban and rural surfaces. Due to the limited land use information of station surroundings, traditional remote sensing products possibly have uncertainties in examining or selecting the representative station in the urban and peri-urban areas. We derived UHI from traditional station classification groups and DTW-based station classification groups, and then examined the similarity of UHI sequences from satellite-based land surface temperature and air temperature to evaluate which station combinations are better for urban climate effect estimation.

The similarity analysis of air temperature sequences
According to the station location and land use information from remote sensing images, we divided the 20 stations into two groups: urban stations (PG, CP, SY, MTG, SJS, HD, CY, TZ, BJ, FT, DX, FS) and rural stations (THK, SDZ, FYD, YQ, HR, MY, ZT, XYL) ( Table 1). The 20 stations exhibited clear intra-group differences in the local environment, such as altitude and surrounding land cover types. Some stations were relocated due to the rapid urbanization, such as about 2 times relocation happened in the SY station, MTG station, SJS station, HD station, TZ station, and FS station (Table 1). The distances calculated by the DTW algorithm were quite different with the values ranging from 95.6 to 1560.1 according to the similarity analysis for air temperature sequences of 20 stations from 1987 to 2016 in Beijing (Fig. 2). As a lower distance appeared in temporal sequences of higher similarity, the similarity level among the same type of stations should be higher than that among different kinds of stations. However, 32% of station pairs among rural stations had a lower similarity level with the DTW distance greater than 800, while no urban station pairs could reach 800, especially larger differences appearing in those rural stations with higher elevation, such as the FYD station (Table 1 and Fig. 2). This phenomenon indicated that rural stations had larger differences from each other, and we should pay more attention to the selection of typical rural stations for urban climate analysis. According to the cumulative probability distribution analysis for similarity level results, lower values of the distance (less than 280) were concentrated at the left end of the horizontal axis, accounting for about 33% of all similarity level results (Fig. 3), which indicates that local environment difference of each station could greatly alter temperature variation. We further analyzed the detailed pattern of similarity level results influenced by urbanization processes in the last decades. Figure 2 provides a detailed pattern about similarity level results among urban and rural stations with Fig. 2 The distances derived from DTW algorithm for air temperature sequences from 1987 to 2016 of every pair of station in Beijing a DTW distance less than 280 to validate whether it was suitable for typical stations identification by spatial coverage derived from remote sensing images. As expected, some traditional rural stations cannot capture the real rural temperature trend because of their high similarity to urban stations, such as DTW distance by 167.3 and 216.7 for the MY station versus the PG station and the HR station versus the FS station, respectively. By contrast, traditional urban stations (BJ, CY, HD et al.) still had high similarity with each other, even some of them experienced relocation during the study period. This phenomenon proved that stations in the urban area had a relatively stable environment of local climate zone with large areas of impervious surface, while rapid urbanization mainly modified the natural surroundings of rural stations. We also found that some rural stations at the remote area of the municipality should not be considered as typical rural stations, such as the lower similarity of the FYD station with all other stations with the averaged distance of 1353.9 ± 171.9. The FYD station is located at the top of Foyeding Mountain with an altitude of 1224.7 m, and its temperature variation more closed to the air temperature trend from the radiosonde profile at this level.

Identification of typical urban and rural stations by DTW algorithm
Meteorological stations may be influenced by urbanization with stations' surroundings invaded by city infrastructures, such as roads, factories, and residential parts of a city, especially in the developing countries (Jia et al. 2015). The uncertainties of temperature variation are usually from two aspects: (1) warming signals induced by urbanization processes (Jia et al. 2015), and (2) the fluctuation of air temperature sequences resulted from station relocation (Yan et al. 2010). As its capability in analyzing the temporal similarity of data sequences, the DTW algorithm is also useful for detecting inhomogeneous signals of air temperature sequences, further selecting typical stations in urban climate effect evaluation. According to the location, the numbers of station relocation, altitude, and land cover of site surroundings, we selected two stable sites, the FT and SDZ stations, as the anchor stations (the referenced station) for typical urban and rural station identification and validation, respectively. First, we examined the similarity of the FT/SDZ stations with other stations to identify possible inhomogeneous conditions of these temperature sequences (Fig. 4). Except for the station combination of the FT station versus the PG station, a higher similarity was found among urban stations with a distance less than 200. The PG station should not be considered as a typical urban station with the similarity level values by 404.3; meanwhile, it was also relatively far away from the city limit of Beijing (Fig. 1). The SDZ station is a typical rural station that has not undergone the relocation, and its temperature change had higher similarity with other rural stations, such as DTW distance by 264 and 180 for the XYL station and ZT station, respectively. The similarity level analysis also suggests that the FYD station, MY station, and HR station were not appropriate to represent air temperature change in rural surfaces. According to the environmental evolution of urban stations and rural stations, we can conclude that urban stations usually maintained their relatively stable environment, while the surroundings of rural stations always experienced dynamic change with the increased urbanization signals, especially those stations located at the urban fringe. Urban land teleconnection analysis also suggests the traditional classification based on a discrete categorization of a rural-urban dichotomy could not meet the requirements of the continuum analysis for urban economy and sustainability (Yan et al. 2010). Therefore, the place-based conception for urban and rural station identification cannot fulfill the task for typical urban and rural station selection. According to these similarity level results from the FT station and SDZ station, we further divided 20 stations into three groups:

Validation for DTW algorithm
To validate the classification results about typical urban and rural stations derived from the FT station and SDZ station, we analyzed the similarity between each station and our classification group, then examined the percentage of each station similar to our classification group using the distance threshold of 280 (Fig. 5). As expected, the percentage  5 The percentage of each station similar to typical urban/ rural stations derived from referenced station and MY stations were not similar to neither the urban station nor rural station, while only 36.4% results from HR station and 9.1% results from the PG station were similar to the urban station. Therefore, our classification for the typical urban and rural stations based on the DTW algorithm was reasonable.
Two sets of station classification results in Beijing were also derived by using the cluster analysis method and identified by remote sensing images, respectively, which were used to validate DTW classification results. Figure 6 shows the station classification results in Beijing by the cluster analysis method according to the similarity results of the temperature sequence of each station from 1987 to 2016. Four station clusters were classified by the cluster analysis method, including cluster 1 (SY, HD, YQ, MY, PG, CY, CP, MTG, BJ, SJS, FT, DX, and FS), cluster 2 (FYD), cluster 3 (THK, HR, SDZ, ZT, XYL), and cluster 4 (TZ). According to the station locations, cluster 1 and cluster 3 were considered as the urban stations and rural stations, respectively, while cluster 2 and cluster 4 had larger differences between the urban station and rural station. Therefore, FYD and TZ were not appropriately used as data sources for urban heat island estimation. Table 1 has addressed station classification in Beijing according to land use/cover map derived from remote sensing images, and these stations were simply classified as urban and rural stations without fully considering the influence of urban expansion that happened in the urban fringe. We further compared the difference of the classification results from the DTW, cluster analysis method, and remote sensing (Table 2). Generally, 65% of these stations had the same station type among the results obtained by the three methods, and the large differences were mainly from the identification for stations located at the urban fringe and rural areas. Compared with the other 2 methods, the DTW algorithm presented more rigorous information for station selection by considering the similarity of data sequences variation and the representativeness of local environment, such as the difference for rural station identification in 3 methods. The selected rural stations (ZT, SDZ, XYL) by DTW algorithm were located at the natural surface and did not experience relocation by contrast with rural stations selected by the cluster analysis method and remote sensing. Those differences were mainly because of (1) the fact that mountain stations cannot represent surface climate trend (FYD), (2)  Stations Similarity Fig. 6 The dendrogram to display the station clusters distinguished by cluster analysis method for temperature sequences of each station from 1987 to 2016, and the tree diagram shows the similarity formed at each step in the amalgamation procedure big environmental evolution of traditional urban station in satellite towns (PG and TZ).

Comparison of UHI from new station groups and traditional station groups
We compared UHI derived from traditional station groups and new station groups from the DTW algorithm to identify the difference of urban climate effect induced by station selection (Fig. 7). Here, the typical mountain station, the FYD station, was also used to examine UHI variation altered by its altitude. Generally, the comparison of UHI derived from different station combinations in Beijing indicated that traditional station groups will underestimate UHI effects. Averaged UHI effects derived from station groups by the DTW algorithm reached 2.31 ± 0.51℃, while they decreased to 1.07 ± 0.57℃ by using traditional station groups. This phenomenon was probably caused by warming signals in rural stations induced by the urbanization process, and the environment investigation by remote sensing images indicates that urban land accounted for about 50% and 38% of the surroundings of the MY station and HR station. UHI effects derived from urban stations versus mountain stations reached 7.13 ± 0.79℃, with higher UHI effects than the other two combinations, mainly because air temperature greatly decreased with the rising altitude. UHI effects have intensified among three-station combinations from 1987 to 2016, and it increased by 2.13℃/decade and 0.95℃/decade by new station groups and traditional station groups, respectively.

Comparison of UHI from air temperature and satellite-based surface temperature
UHI is usually classified as surface urban heat island (SUHI), canopy urban heat island (CUHI), and boundary urban heat island (BUHI) according to their characterization in different layers of the urban atmosphere (Hu et al. 2019;Yuan and Bauer 2007). SUHI can be derived from land surface temperature records, while CUHI is calculated by the difference between urban and rural temperature records. Annual SUHI and CUHI records from different mega-cities have similar trends and close magnitude (Hu et al. 2019). Surface temperature from the urban and rural surfaces was adopted in quantifying SUHI, which provided more information than single stations. The SUHI from 2003 to 2016 was quantified by the land surface temperature difference between urban and rural surfaces (Hu et al. 2019). We examined the similarity between SUHI and CUHI sequences, and found the similarity between SUHI sequences and CUHI from new station groups reaching 81.94, which was less than that from SUHI versus These results suggest that the temporal change and overall magnitude of UHI from new station groups identified by the DTW algorithm more approached the SUHI results, which proved that the new station group was helpful for UHI estimation. The DTW algorithm showed special advantages in selecting the typical station groups for comparing UHI results across regions, especially for rapid urbanization regions. In those regions, the urban fringe was considered as a transition zone from urban to rural areas with high interaction between urban expansion and rural activities, and it could become the fastestchanging region with its rapid land surface transformation. In fact, it is hard to exactly distinguish the border of suburban, urban fringe, and rural areas according to the urban-rural continuum characterization. Remote sensing images may provide exact spatial information. However, we may not be sure whether meteorological stations have already experienced the transformation of station type because of the possible warming signals induced by the urbanization process from near-surface urban land (Hu et al. 2019). Thus, we think it is better for typical urban and rural station selection by considering spatial coverage information from remote sensing and temporal change information by the DTW algorithm.

Conclusion
The dynamic time warping algorithm (DTW) was investigated in this study to improve the selection of urban and rural stations for UHI estimation. The similarity of temperature sequences was derived from air temperature sequences from 1987 to 2016 of 20 meteorological stations in Beijing using the DTW algorithm. Then, we analyzed the station combinations with high similarity of temperature sequences using the cumulative probability distribution analysis, further dividing those stations into three groups to identify typical urban stations (FT, SY, HD, TZ, CY, CP, MTG, BJ, SJS, DX, FS) and typical rural stations (ZT, SDZ, XYL). According to the similarity level analysis and validation by remote sensing images and cluster analysis method, new station groups for UHI estimation are reasonable. Meanwhile, this method is also helpful in detecting inhomogeneous signals of temperature sequences of traditional rural stations, and we found that the PG station, MY station, and HR station are not suitable for representing temperature conditions in the rural surface anymore because they have been influenced by urbanization process. We also found that mountain stations, such as the FYD station, have big differences from the rural station and urban station, and they aren't appropriate to analyze urban climate effect.
Our study analyzed the bias in the selection of typical urban and rural stations, and the results proved that dynamic urbanization processes could cause potential uncertainties for urban land identification, especially for those stations located at urban fringe who experienced rapid urbanization. Therefore, the DTW algorithm analysis for air temperature sequences may provide another simple way to understand their representativeness. Another advantage of this method is its fewer requirements for background information of meteorological stations. Therefore, it is more convenient for researchers to understand the station situation in a new region, even on regional or global scales. Certainly, it also has some limitations or uncertainties when fewer samples of meteorological stations are adopted in this algorithm, and in this case, we should know more background information.