A quantitative method for the similarity assessment of typhoon tracks

Typhoons are one of the most dangerous types of natural hazards; they are always developed in the western and southwestern Pacific Ocean and pose economic and human security threats to the Pacific Rim annually. Therefore, many scholars in related fields devote themselves to finding an effective way to analyze and forecast typhoon tracks to prevent disasters. Similarity analysis of typhoon tracks can provide great help for typhoon prediction. In this paper, a model for typhoon similarity analysis is proposed to effectively measure and quantify the similarity between two historical typhoon tracks based on the dynamic time warping algorithm, in which five typhoon elements—namely, longitude, latitude, central pressure, expanded Beaufort scale, and movement speed—are integrated to derive a final similarity percentage indicating the similarity level. At the end of this paper, case studies concerning historical typhoons and the ongoing Typhoon 202,106 In-Fa are also conducted to verify the validity and effectiveness of the proposed model. The results show that the proposed model can effectively provide a quantitative similarity of two typhoon tracks when functioning well on ongoing typhoons with a cutoff rule and supplying promising support for typhoon prediction simultaneously.


Introduction
A typhoon is a kind of tropical cyclone whose central wind speed can easily exceed 17.2 m/s. Typhoons develop on the surface of the western and southwestern Pacific Ocean and bring ample water vapor to the Pacific Rim, where they usually cause severe convective weather, such as heavy rains and storm surges. Thus, typhoons are an accepted devastating natural phenomenon that imperil human safety and economic development in coastal areas. For instance, in 2019, super Typhoon 201,909 Lekima caused 48 deaths or disappearances in China, with an economic loss of 40.71 billion yuan (Zhou 2021). Therefore, typhoons have become a hot topic because of their destructiveness, and many previous studies in this field have focused on typhoon track analysis for the sake of issuing early warnings of developing typhoons and reducing injuries and social loss.
Some studies on typhoon tracks pay close attention to track prediction, whose main methods originate from numerical prediction. Numerical prediction is also known as forecasting technology based on typhoon dynamics, and all analyses are centered on dynamic equations involving meteorological factors and physical processes. Hong et al. (2007) serialized the 500 hPa geopotential height field of the T106 numerical forecast product and applied the empirical orthogonal function to these data to optimize the parameters of the T106 model in parallel and globally. Zhang et al. (2017) attempted to incorporate largescale climate change models, such as the Pacific Meridional Model, the Atlantic Meridional Model, and the North Atlantic Sea Surface Anomaly Temperature, as forecast factors in the statistical dynamics approach and improved the effects to a certain degree. Other research attempted to think outside the meteorological methods that basically generalize the generation and development principles of typhoons by various kinds of meteorological models. Such studies have also become an important part of typhoon analysis. In the 2010s, artificial intelligence (AI) achieved great success in typhoon forecasting. Hong et al. (2017) applied convolutional long short-term memory to a new typhoon path prediction model, which is helpful in processing spatial data over time. Gao et al. (2018) trained a long shortterm memory (LSTM) neural network to establish a forecast model based on machine learning for the prediction of typhoon tracks. Jiang et al. (2018) proposed two algorithms based on machine learning neural networks that can be used with atmosphere-only typhoon forecast models to provide flow-dependent typhoon-induced sea surface temperature cooling for improving typhoon predictions. Alemany (2019) rasterized the region of the Atlantic Ocean by a resolution of 1° × 1°, coded these grids with fixed numbers into which points of typhoon paths were categorized, and then predicted the grid number by an LSTM model. With the rapid development of geographic information systems (GISs), typhoon analysis studies are inspired with new methods in GIS (Moayedi et al. 2019;Alizadeh et al. 2020), for geographic methods also have great potential in terms of enhancing the disaster management (Li et al. 2021a, b) and reproduction (Li et al. 2021a, b). Geographers have come up with the idea that the similarity between two typhoons is essential to typhoon research because the similarity analysis of historical typhoons is a great support and supplement to typhoon track prediction. According to the First Law of Geography-everything is related to everything else, but near things are more related than distant thingsproposed by Tobler (1970), similar typhoons are highly likely to travel along the similar trajectory with similar natural properties. Therefore, similar tracks can exert a referential effect to predict tracks of ongoing typhoons. Zou et al. (2008) built a system based on the key point similarity method and buffer spatial analysis for fast forecasting of typhoon track trends within 24-48 h. This study determined the track similarity via the quadratic ratio of the distance between corresponding key points and the radius of the circular buffer, and it was considered pioneering work in GIS-aided typhoon prediction. Hsu et al. (2014) developed a GIS-based decision support system, which integrated the real-time rainfall monitor and forecast information, for enhancing the emergency operations during typhoon attacks. Wu et al. (2014) first put forward Information-Matter-Element Model and established a GIS system for rainfall-based landslide hazard prediction. Bui et al. (2019) proposed and verified a new soft computing approach based on multivariate adaptive regression splines and particle swarm optimization for spatial prediction of flash flood susceptible areas at high-frequency tropical typhoon area. Wang et al. (2021) semi-quantified the risk assessment of storm surge under different typhoon intensities and performed the assessment in ArcGIS by the coupled model ADCIRC-SWAN and the Jelesnianski method.
However, methods utilizing GIS technology result in the following problems: On one hand, more stress is placed on geodesic elements such as longitudes and latitudes in the GIS field, which does not give enough consideration to other factors of typhoons; on the other hand, the processing of geographic data consumes more time and memory resources than normal data without geoinformation. For this reason, some scholars try to introduce concise mathematical models that lower the calculation power demand and quicken operation speed. She et al. (2014) abstracted the typhoon track as a plane curve and evaluated similarity in terms of the shapes of typhoon tracks and numerical indices of track points. Huang et al. (2019) vectorized typhoon data and applied principal component analysis and dynamic time warping (DTW) to portray similarity. However, these studies mostly lack explanations for the final results, and some of them are inclined to check the integrity of typhoon data and interpolate the missing data instead of performing similarity analysis. Thus, this paper tries to fill the gap of interpreting similarity intelligibly.
Looking at typhoons that have developed in the past decade, this paper serializes the original typhoon data, calculates the transitional similarity distance between two typhoons based on the DTW algorithm and, for the first time, proposes a new metric that quantifies the similarity degree by normalizing the corresponding similarity distance with the modified hyperbolic tangent function.
The other sections of this paper are organized as follows. Section 2 introduces the typhoon data used throughout and establishes the model for subsequent analysis. Section 3 presents the detailed procedure of the algorithm this paper utilizes based on DTW. Section 4 conducts experiments designed and discusses the results and parameters, with trying to filter similar typhoons of the currently ongoing Typhoon 202,106 In-Fa. Finally, Sect. 5 concludes the current research and proposes future research.

Data introduction
The typhoon data from 2010 to 2020 utilized in this paper were obtained from the Real Time Release System of Typhoon Track (http:// typho on. zjwat er. gov. cn/ defau lt. aspx) provided by the Ministry of Water Resources of Zhejiang Province, China. The data are transformed into a JSON-formatted file after data gathering and pretreatment for further analysis, which consists of 272 key-value pairs, as shown in Fig. 1.
Each key-value pair is composed of one key and one value and coded as ""key": value," where the key is the index of the typhoon, and the value is a list of lists ordered by time, 1 3 each of which contains 6 elements denoted as " T i , Lon i , Lat i , P i , L i , V i " representing the time elapsed after generation in hours (h), the longitude and the latitude of the central cyclone (unitless), the central pressure in hectopascals (hPa), the grade of wind speed in the expanded version of Beaufort scale (unitless), and the move speed of the typhoon in kilometers per hour (Km/h), respectively.
Considering that the original data were not organized with an equal time interval, which technically means that the interval of early observation records was 6h , while the interval of more current records, in recent five years, has been shortened to 3h usually or 1h when a typhoon is going to make landfall or intensify its wind speed, attached to each datum item as a time indicator ( T i ) is, please notice that it is not involved in subsequent calculation, and this will be explained later when introducing the method.

Modeling and data preprocessing
To describe typhoons comprehensively, a more concise definition is given as where T is the typhoon time series; n is the length of the typhoon time series; and each p i ( i = 1, 2, 3, … , n ) is a multidimensional vector consisting of the same number of observation values of a certain typhoon at different times, which makes T a multidimensional time series.
Owing to the lack of observational measurements, some components in a p i marked as − 1 are indicated as missing, and this p i is called a missing vector, while a vector whose components are all nonnegative is viewed as a complete vector. To lower the error between a series with missing vectors and one without, two criteria adopted for preprocessing are followed: (a) If and only if the number of missing vectors in a certain typhoon time series is greater than or equal to 4, it is considered nugatory in future research and is to be discarded. (b) Otherwise, a simple moving average is applied to the typhoon time series to make all vectors complete. This application can be formulated as where p k is the missing vector; each p i is a complete vector that is adjacent to p k within 2 indices; and n is the number of these complete vectors.

Methodology
In this section, a method of estimating and quantifying the similarity between two typhoon time series based on the dynamic time warping (DTW) algorithm is expounded in detail. DTW was first proposed in the 1960s (Bellman and Kalaba 1959) and developed rapidly in the field of speech recognition in the 1970s (Sakoe and Chiba 1978;Myers et al. 1980). This algorithm is a high-performance algorithm with noteworthy robustness that adopts the idea of dynamic programming to accumulate the shortest warping distances of two time series whose lengths are unequal (Müller 2007). Due to the inequality of existence time of different typhoons, the lengths of the corresponding time series are not equal. However, conventional time series processing algorithms require the same length of time series, which brings about the potential fitting and resampling of the time series when forcing the length of the two typhoon time series to be the same and thus results in the decline of the accuracy. Therefore, the application of the DTW algorithm can better avoid the problems caused by conventional algorithms. Data that the DTW algorithm generally handles are sampled with equal time intervals, but as mentioned in Sect. 2.1, the observation intervals of typhoons are usually unequal. Giorgino (2009) pointed out that DTW is "stretch-insensitive," for it is insensitive to prolonged duration of tones when applied to speech recognition. Consequently, inconsistent time intervals of typhoons can be viewed as different degrees of stretching to make the timeline equally spaced. And for this reason, time indicators are trivial factors in the DTW algorithm.

Purpose and symbol definition
Considering two given historical typhoons, this section proposes a method to analyze the corresponding typhoon time series, which the analysis indicates the difference between. (2) To simplify the description of the method, some definitions of symbols are given as follows.
(a) The two historical typhoon time series are denoted as where B is named the base time series whose length is n and C is named the compared time series whose length is m, and most of the time this nomenclature is circumstantial. Each component of B and C, namely b i and c i , is a multidimensional vector p i mentioned in Sect. 2. (b) The distance function between two multidimensional vectors p 1 and p 2 is defined as where p 1 i is the i-th component of the vector p 1 ; p 2 i is the i-th component of the vector p 2 ; f i is the positive coefficient of the weight of the i-th part and ∑ n i=1 f i = 1 ; and n is the number of dimensions of the vector. Considering that all elements in the cost function matrix, , are quadratic in dimension, the square root of the last one of is taken as the similarity distance to make the unit of the result linear, namely, Soboleva and Beskorovainyi. (2008) introduced a modified hyperbolic tangent function originally for multiobjective optimization in decision-making. This function is defined as: which can be viewed as a universal form of the hyperbolic tangent function as it turns into the normal tan h function when a = b = c = d = 1 . This function is widely used in AI fields as a kind of activation functions to increase the nonlinearity of neural networks (Golev et al. 2017).

Normalization of similarity
Given that the mtanh function can inherit the features of the tanh function better when 4 parameters a , b , c , and d are all equal to one positive real number, which is marked as q , a simplified function for normalizing the similarity distance to a number ranging in (0, 1] is defined as: where p = e q > 1 and p is named the simplification parameter. To derive a clear and comprehensible index for evaluating the similarity between the two typhoon time series, a warped function named the similarity percentage is defined as: where x is the similarity distance educed by Formula (8) , and p is the simplification parameter mentioned in Formula (10).

Experiment design
The typhoon time series that this section utilizes are extracted from Sect. In Sect. 3.1, Formula (5) introduced n coefficients f i ( i = 1, 2, 3, … , n ), where n = 5 since vectors involved in the distance function are all five dimensional, as mentioned in Sect. 2.1 but without time indicators as explained in the beginning of Sect. 3, which can be considered the significance of different elements of typhoons. Among the five elements, the longitude and the latitude are required, and the other three elements are optional, which is not fixed and can be adjusted flexibly in practice use. According to empirical recognition of typhoons and experts' advice, within this experiment, the importance of geodesic elements (longitude and latitude) is emphasized, while the importance of dynamical elements (movement speed) is abated instead. In consideration of typhoon presence as a synoptic system, meteorological elements (central pressures and Beaufort scales) are also weighted more than dynamical elements but less than geodesic elements. In summary, these coefficients are assigned values, as shown in Table 1.
Therefore, the expansion form of the distance function is Meanwhile, the simplification parameter p introduced in Formula (10) is assigned with 1.005, which will be discussed later.

Result presentation and analysis
After calculating the similarity distance and the similarity percentage between the base time series and each typhoon in the compared time series set, 6 typhoons listed in Table 2 are selected as representative typhoons and plotted in Fig. 2. Please notice the base position of Typhoon 202,018's time series, hence, the zero similarity distance and the 100-percent similarity percentage for a time series are absolutely the same as itself and nothing vary.
The fundamental principle of similarity normalization is that the shorter the similarity distance between the compared time series and the base one is, the higher the percentage this pair takes, since the similarity distance is a positively correlated metric measuring the contrast of two time series, and the normalization function is strictly monotonically decreasing, which is corroborated in Fig. 2.
The paths of these 6 typhoons are mapped in Fig. 3 using the Mercator projection. From the map, it can be intuitively perceived that Typhoon B, which has an up-to-80 similarity (12) dis p 1 , p 2 =0.26 Lon 1 − Lon 2 2 + 0.26 Lat 1 − Lat 2 2 + 0.22 P 1 − P 2 2 + 0.14 L 1 − L 2 2 + 0.12 V 1 − V 2 2  1 3 percentage, is the closest to Base Typhoon A, while northern-trajectory Typhoons D, E, and F with longer similarity distances are huddled together and perpendicular to the base typhoon overall, with Typhoon C following a similar but outlying path from the base one. In this experiment, a laptop equipped with an Intel® Core™ i5-8265U CPU @ 1.60 GHz, an 8 GB RAM @ 2400 MHz, and a 256 GB SSD is installed with Windows 10 Home 64 Bits, Python 3.8.0, and gcc (MinGW.org GCC-8.2.0-3) 8.2.0; codes are programmed in Python-C hybrid style in which Python is used to load and pretreat data, and C-Language is intended to accelerate the calculation of similarity distances and percentages. The Python package time is used to determine the runtime of each programming language. Conclusively, with the typhoon time series of Typhoon A selected as the base time series and all other 52 time series in 2019 or 2020 forming the compared time series set, during 50 repeated experiments, the Python side took 6.829 ms and the C-Language side took 60.730 ms on average, which means that results are calculated in an instant.

Parameter discussion
In Fig. 3, under the circumstance where the geographic similarity is emphasized, as mentioned in Sect. 4.1, it is slightly contradictory that Typhoon E has a higher similarity percentage than Typhoon F, while E is farther away from A (the base typhoon) than F. The reason is that the similarity can also be affected by elements apart from geodesic ones. Section 4.1 introduces the significance rank of coefficients of different elements and assigns different values. The idea of this process is that the more significant some element is, the higher the expectation of exerting more prominent discrimination on the corresponding factor. A re-experiment is conducted to correct the visual contradiction without the influence of meteorological and dynamical elements. The comparison is shown in Table 3, which illustrates that the extent of elements' influences can be customized by modifying their coefficients.
In addition, the five coefficients mentioned in Table 1 can be assigned different values according to different uses. From the perspective of meteorologists, the similarity in meteorology is more significant than in geographic morphology, and the assignment is shown in Table 4.
Typhoon 201,909 Lekima is selected as the base time series, and the compared time series set is expanded to cover all typhoons ranging from 2018 to 2020 except discarded typhoons. With the simplification parameters p remaining unchanged, the top 4 typhoons, including the base typhoon, with the shortest similarity distances, are chosen as representative typhoons, and their similarity information is plotted in Fig. 4. These typhoons Table 3 The comparison between similarity distances and percentages of Typhoons E and F before and after the modifying coefficients are Typhoon 201,909 Lekima coded as "G," Typhoon 202,010 Haishen coded as "H," Typhoon 201,808 Maria coded as "I," and Typhoon 202,009 Maysak coded as "J." Similar to the origin experiment, the similarity percentage of the base time series remains 100% because of the zero similarity distance. Paths of these typhoons are mapped in Fig. 5.
Base Typhoon W developed at 17:00 (UTC + 8, and the same below) on August 4, 2019, on the ocean surface east of the Philippines and then traveled northwestward, rapidly intensifying from a strong tropical storm to a super typhoon within 24 h. At 1:45 on August 10, it made landfall off the coast of Wenling, Zhejiang Province, China, with a maximum wind speed of 52 m/s rated Grade 16 and a minimum central pressure of 930 hPa, which gradually weakened after landfall. Typhoon Z was developed at 17:00 on August 28, 2020, on the ocean surface east of the Philippines and moved southwest. At 5:00 on September 1, it intensified into the first super typhoon of 2020 and then shifted to the east-northwest and began to weaken, making landfall off the coast of Busan, Gyeongsangnam-do, South Korea at approximately 1:30 on the 3rd with a wind speed of 42 m/s rated Grade 14 and central pressure of 950 hPa. Typhoon Y was developed at 20:00 on September 1, 2020, in the northwest Pacific Ocean, intensified to a super typhoon at 5:00 on the 4th and landed on the southern coast of  However, a deficiency is reflected in Fig. 5: The similarity percentages are not clearly discriminative. Since the similarity percentage is calculated by Formula (11), the simplification parameter p is the decisive parameter varying the result. Situations of Formula (11) with different simplification parameters are plotted in Fig. 6.
It is easy to prove that where p 1 > p 2 > 1 . This proof can be interpreted that the plot of f (x;p) will nest toward the origin point (0, 0%) with the increase of p. In addition, the function of p is to stretch the plot in different ranges, as plot f (x;1.4) can be applied to handle similarity distances ranging in (0, 10] . Since the purpose of switching the simplification parameter p is to distribute similarity distances to the range of (0, 1] , combined with data ranging in (0, 500] in the original experiment, it can achieve a remarkable effect when p < 1.4. (13) ∀x > 0, f x; p 1 < f x; p 2

Fig. 5
The paths of the most similar typhoons, annotated with their codes and similarity percentages, and the base typhoon are displayed on the map. The abbreviation "SP" means "Similarity Percentage" 1 3

Similarity analysis on ongoing typhoon
Typhoon 202,106 In-Fa generated at 2:00 on July 18, 2021, in the northwest Pacific Ocean. It traveled anticlockwise and then moved northwest when it was approximately at the point of 125.6 • E, 23.5 • N at 1:00 on the 23rd. Given that Typhoon 202,106 is an ongoing typhoon till now (19:00, July 24, 2021), when it is chosen as the base typhoon, a cutoff rule is proposed as follows to extract valid parts of historical typhoons: The valid typhoon extraction begins with the point of generation and ends with the point that is nearest to the latest point of the ongoing typhoon under the WGS84 (World Geodetic System 1984), covering all points within this segment.
Since an ongoing typhoon is the first half of the entire typhoon, while data of the entire typhoon are incomplete before its extinction, the track is the main factor that needs to be taken into consideration and coefficients of longitudes and latitudes are assigned as 0.5 and other elements are ignored within this experiment. With the simplification parameters p remaining 1.005, Typhoon 202,106 is selected as the base typhoon, and the top 5 similar typhoons are selected as representative typhoons, and their similarity distances and percentages are listed in Table 5.
Tracks of these 6 typhoons are also mapped in Fig. 7. The cutoff points are located where solid lines turn into dashed ones, which means that solid segments are valid typhoon extraction, and dashed segments are rest parts of corresponding historical typhoons.
From Fig. 7, all five similar typhoons take similarity percentages over 90% and can be considered more potential in helping predict the ongoing Typhoon K. While typhoon L, O, and P made their landfall off the coast of Zhejiang Province, China, Typhoon K is also excepted to land at the same area. Furthermore, with the development of Typhoon K, the algorithm will adjust the similarity calculation, and by then, Typhoon M and N would fade out of the list of similar typhoons.
Therefore, this experiment also corroborates that the method with the cutoff rule functions well in handling ongoing typhoons and finding similar typhoons that have certain referential value.  . 7 The paths of the top 5 similar typhoons, annotated with their codes and similarity percentages, and the base typhoon are displayed on the map. The abbreviation "SP" means "Similarity Percentage"

Conclusion
The definition of typhoon similarity has not been clearly and accurately determined by academic circles. As a result, the method proposed in this paper contributes to first quantifying and normalizing the similarity between two historical typhoons based on the dynamic time warping algorithm. Experiments illustrate that the method can make a distinction between typhoons and filter the relatively more similar typhoons under certain parameter specifications, such as Typhoon B with an 82.6% similarity and Typhoon C with a 68.3% similarity, and also demonstrate the feasibility that this method is able to process various types of historical and ongoing typhoon time series data while maintaining appreciable objectivity and accuracy. Moreover, the method shows great potential in seeking similar typhoons with ongoing typhoons and supporting typhoon prediction. The parameters involved with this algorithm are highly customizable in light of users' needs. However, this research still lacks a rational explanation for the similarity distance in physics and a systematic criterion to assign parameters reasonably and purposefully. In the future, a series of parameter recommendations need to be formulated for usages in different circumstances and fields. With the increase in typhoon data collected and the support of machine learning technology, a method for forecasting the path of a developing typhoon based on this research also requires investigation for practical application.