Free congruence: an exploration of expanded similarity measures for time series data

Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often involve point-to-point distance measures including Euclidean distance and Dynamic Time Warping. Such similarity measures fundamentally require the fluctuation of values in the time series being compared to follow a corresponding order or cadence for similarity to be established. This paper is spurred by the exploration of a broader definition of similarity, namely one that takes into account the sheer numerical resemblance between sets of statistical properties for time series segments irrespectively of value labeling. Further, the presence of common pattern components between time series segments was examined even if they occur in a permuted order, which would not necessarily satisfy the criteria of more conventional point-to-point distance measures. Results were compared with those of Dynamic Time Warping on the same data for context. Surprisingly, the test for the numerical resemblance between sets of statistical properties established a stronger resemblance for pairings of decline years with greater statistical significance than Dynamic Time Warping on the particular data and sample size used.


INTRODUCTION
Similarity measures between datasets are pertinent to virtually every area of science and computing, with applications ranging from classification and machine learning to forecasting and beyond. To date, many studies have been conducted on similarity measures as specifically applied to time series data. Pattern sequence similarity has been of particular interest in forecasting series of economic significance, such as electricity demand and solar power output.
[1], [2]. A common approach is to make use of distance measures to identify samples of a time series similar to a period of interest, and use the most similar historical samples to inform a forecast of subsequent changes. Such approaches can be generally categorized as clustering methods. Distance measures are a way of quantifying pattern similarity between time series by measuring the proximity between points they comprise. Sun et al. [3] employ point-to-point Euclidean distance in conjunction with Angle Cosine for clustering similar segments of time series data in the context of wind power forecasting, namely by using the similar samples as 2 training data for a neural network. In another study, Bandara et al. [4] also apply distance-based clustering to the training of a neural network in the context of forecasting. Dynamic Time Warping [5], commonly abbreviated as DTW, is an applicable similarity measure when the resemblance between time series isn't temporally synced. The minimum degree of "warping" necessary to reasonably sync the series for a best match (in terms of point-to-point-distance) can serve as a similarity measure. Yu et. al [6] demonstrate the applicability of DTW alongside a neural network to forecast peak load in power demand. Studies have also investigated the use of DTW solely for time series classification purposes without the forecasting element [7,8,9]. As demonstrated in a review of similarity measures by Serrà et al. [10], point-to-point distance measures are commonly implemented on the features of time series data rather than the raw time series itself by using Fourier coefficients. In a separate category from the aforementioned research, some work has also been done regarding comparison of local statistical test results to detect changes in a time series. Kosiorowski et al. [11] use an adaptation of the Wilcoxon rank sum test for detecting structural changes in samples of a time series. Tang et al. [12] compare the statistical properties of time series segments classified into flow states. In their review, Zou et a. [13] detail statistical similarity between time series segments as a rationale for choosing proximity networks such as recurrent networks [14] for time series analysis. Still, the use of local statistical tests on time series samples in the specific context of similarity measures or for change detection remains less frequently studied.
Whether applying similarity measures for forecasting or classification purposes, the majority of studies approach similarity measures in terms of point-to-point distance such as Euclidean distance or DTW. Both aforementioned methods require the rise and fall of values in the time series being compared to follow a similar order or succession, even if the resemblance isn't temporally synced (as is the main application for DTW).
In order to illustrate how the notions of similarity explored in this paper are distinct from conventional similarity measures, we must first clearly define what will be called "super-sequences" and "fluctuation subsequences" going forward. A super-sequence is simply defined as a sequence of values within a temporal range of a time series delineated by a start and end, in this case a date range. It can be thought of simply as a historical segment of a time series. A fluctuation subsequence will be defined as the sequence of percent change magnitudes from value-to-value in a proper subsequence of the super-sequence. Going forward, fluctuation subsequences will be treated as attributes of super-sequences.
The first similarity test quantified the numerical resemblance between sets of descriptive statistical properties of the two super-sequences. The set of statistical properties of the super-sequences is defined as a finite set of the following discrete values: mean, standard deviation, minimum, maximum, twenty fifth percentile value, fiftieth percentile value, and seventy fifth percentile value of the super-sequence. The sheer numerical resemblance between such sets was established regardless of the "ground truth" labelling of the values. Otherwise put, the presence of roughly equal values in sets of statistical properties, even if they aren't allocated to the same property, would still establish a degree of resemblance. This is a departure from the norm in that statistical similarity usually means properties must be similar (e.g. similar mean values).
The second test was simply based on the number of fluctuation subsequences in common between the two super-sequences, even if the super-sequences do not bear resemblance in terms of point-to-point distance. In other words, a match was established if common fluctuation subsequences were at all present (even in a permuted order), which would not necessarily satisfy the criteria of point-to-point distance measures; the latter requires patterns to occur in the corresponding order. Since point-to-point distance measures serve as the conceptual underpinning of most similarity measures including those used in clustering, this is also a departure from the norm in the context of the application.
The newly explored notion of similarity as untethered to ground-truth labelling of statistical properties or pattern permutation order will be termed "free congruence". The practical significance of intersecting descriptive statistical values despite their labelling, and common pattern components despite permutation order, is a seemingly unlikely yet interesting possibility to methodically explore in the context of time series similarity. This study addresses whether free congruence would accurately suggest stronger similarity between an exemplary period of decline and other periods of decline in a time series than between the said exemplary decline period and periods of growth in the same time series. As demonstrated by related work, similarity measures can draw comparisons between time series samples such that similar samples can successfully inform predictions. If the newly explored approaches can first successfully identify similarity in a way that successfully establishes stronger resemblance between decline years (other than the obvious feature of overall decline itself), further study may be warranted for the application of these similarity measures as classification tools or even predictive indicators. Limitations of this study include the fact that predictive value of the newly explored similarity measures was not yet evaluated, nor were the parameters for classification fine-tuned with training and testing data; simply the ability to rate samples for similarity was examined as an initial step. Tests were carried out on a time series dataset representing cooperative intent expressed in the global media sentiment from 1992 to 2017. A year of markedly decreased total cooperative tone in the media was chosen and compared with other years of both decreased and increased cooperative tone for similarity within the new definitions.
Results of the new method were compared with those of DTW on the same data for context.

DATA
Tests were carried out on a time series dataset representing global cooperative tone in the media since 1992 obtained from GDELT, or the Global Database of Events, Language and Tone [15]. GDELT aggregates data from global print, broadcast, and web news and applies natural language processing to quantify events and sentiment both globally and by region. The data were accessed via Google BigQuery [16] and queried via standard SQL. The exact GDELT eventcode [17] used to obtain the data was 03, representing 'express intent to cooperate' (not to be confused with 'engage in material cooperation' which is an available filter assigned to another eventcode). Fraction dates represent the percent of the year completed at a given time (ranging from 0 to 0.9999) and are a way of roughly standardizing the temporal distance between dates. Fraction dates do not take into account leap years or the varying length of the months. The final data used were normalized as a percent of total, namely the percent of cooperative tone instances in the media out of the total news press (comprising all event codes) for each fractiondate.

TESTING PROCESS
The test process for determining the degree of similarity between two segments of a time series can be broadly outlined as follows:    x In this light, let and

NUMERICAL RESEMBLANCE BETWEEN STATISTICAL PROPERTY SETS
Let denote the 0.3 neighborhood for all values in : Let the set of common values between sets and be denoted as The final result of the test is given by: ) ((length(I)/length(K)) 00 G = rd 1 100 × 1

FLUCTUATION SUBSEQUENCES
After quantifying the numerical resemblance between sets of statistical properties as detailed

DYNAMIC TIME WARPING
DTW distance was also calculated for each pairing of and . The results of the DTW Y X algorithm were compared with the results of the newly defined similarity measures. DTW is an algorithm for quantifying optimal alignment between two time series or temporal sequences which differ in speed or cadence. Unlike pure Euclidean distance, DTW performs one-to-many and many-to-one matches between data points so that sequential matches in peaks and troughs between datasets can be identified even if they aren't exactly synced in time. DTW was chosen for comparison with the newly explored methods because it can identify similarity where pure Euclidean alone distance may not. As a general definition, DTW calculates the optimal alignment between two sequences or time series by matching each index from one sequence to an index in the other sequence (and vice versa), generating a non-linear "warping" between the series. The degree of warping required to align the sequences optimally is considered to be the cost, and this cost can be used as a similarity measure between two time series. Intuitively, the lower the cost, the more similar the sequences being compared. DTW distance is calculated via an O(nm) algorithm based on dynamic programming to contend with the computational complexity that comes with calculating every possible warping path. FastDTW [18] was implemented in Python [19] via the FastDTW library [20] to calculate DTW distance.
DTW performed on two hypothetical time series and is more formally generalized as

A B
follows in accordance with the standard definition [5]: and . It is assumed is based on a local distance measure, also referred to as a local cost measure. , The terms "cost" and "distance" in this context are interchangeable. The local cost measure is the distance function, defined as follows: The closer the resemblance between sequences and , the lower resultant value . The

A B C ∈ ℝ N ×M
, and so . The alignment between and such that the cost is kept to a minimum will be the optimal alignment. The optimal alignment path can be algorithmically determined, and is a path covering the lowest cost regions of the matrix. The term alignment path is synonymous with warping path in the conventional nomenclature around DTW. The warping path , therefore, makes for an alignment between sequences p = ( , .. , ) p 1 . p L and by associating elements of to elements of .

A B A B
The total cost for a warping path in the context of the cost measure is:

RESULTS
The test results for similarity measures between the decline year and other historical decline years (test set one) are as follows:    found to represent the same property. This substantiates one aspect of the notion of free congruence; numerical resemblance alone is valuable without the criterion that overlapping values must necessarily represent the same property. This is a surprising result which is, at the very least, worthy of additional investigation. Limitations of this study call for the phenomenon to be examined on larger sample sizes and for further inquiry into possible explanations for the observation.

CONCLUSION
This

COMPETING INTERESTS
The author declares that they have no competing interests.

ACKNOWLEDGEMENTS
This research would not have been possible without the ongoing support of Mimi and Don Jacaruso and Luciana DeClemente, family of the author.

AVAILABILITY OF DATA AND MATERIALS
Specific datasets related to the research presented in this paper can be obtained as detailed in the work, or can be obtained from the corresponding author upon reasonable request.