Temporal super-resolution traffic flow forecasting via continuous-time network dynamics

Traffic flow forecasting is a critical task for intelligent transportation systems. However, the existed forecasting can only be conducted at certain time steps, because the data are discretely collected at these time steps. In contrast, traffic flow evolves in real time via a continuous manner in real world. Therefore, an ideal forecasting paradigm should be performed at arbitrary time steps instead of only at these certain time steps. Considering the forecasting time steps will no longer be restricted by these time steps, we call such paradigm as temporal super-resolution forecasting. In this paper, we incorporate the idea of neural ordinary differential equations (neural ODEs) to handle the problem, modeling the change rate of traffic flow on the urban road. Therefore, due to the continuous nature of ordinary differential equations, the traffic flow at arbitrary time steps can be forecasted by performing definite integral for the change rate. The urban road is usually regarded as a network, and the change rate of which can be described by continuous-time network dynamics, we parameterize the network dynamics of the traffic flow to quantify the change rate. On these foundations, we propose spatial-temporal continuous dynamics network to complete the temporal super-resolution forecasting task. Extensive experiments on public traffic flow datasets illustrate that our model can achieve high accuracy on temporal super-resolution forecasting, while ensuring its performance on conventional experimental settings at these certain time steps.


Introduction
With the urbanization and intelligence of human activities, traffic flow forecasting plays a fundamental role in urban governance. Such a task can significantly enhance various urban computing tasks, e.g., traffic congestion control, vehicular trajectories analysis, estimating time of arrival and internet of vehicles [1][2][3][4]. In most cases, traffic flow data are usually recorded at certain time steps, resulting in the recorded data has inherent discrete nature.  The discrete nature hinders us from handling the information within intervals among recording time steps. Because there is no information to be contrasted within recording intervals, implying most of the information is unrecorded and intractable, resulting in the forecasting can only be conducted at these recording time steps. For instance, when the recording interval is 5 min, one can't forecast the traffic flow after 30 s or 7 min, since there are no supervision signals to guide the forecasting at corresponding time steps. Under such circumstances, the forecasting flexibility will be significantly restricted. Especially in some cases, people usually need more flexible or more frequent forecasting, e.g., rescue activities, emergencies or rush hours, etc., the equal-resolution forecasting under the existing paradigm, where the forecasting intervals equal to the original recording intervals, will powerless. Therefore, a new forecasting paradigm that beyond the original recording time steps will be more valuable to those in need. The motivation can be illustrated in Fig. 1A.
This paper aims to make the traffic flow forecasting can be conducted at arbitrary time steps, regardless of the recording intervals. In which, the forecasting intervals can be far smaller than recording intervals. We call such a forecasting paradigm as temporal super-resolution traffic flow forecasting (TSRF for short). Correspondingly, the conventional case that the forecasting are conducted at recording time steps is referred to as temporal equal-resolution traffic flow forecasting (TERF for short) in this paper.
Despite its significance, TSRF is very challenging due to the following reasons: • The sparsity of supervision signals. The coarse-grained nature results in the extreme sparsity of provided supervision signals, which means that the forecasting at most time steps is unavailable to contrast with supervision signals. Therefore, we should make full use of the limited supervision signals to overall optimize the forecasting at all time steps, which is not focused on in the previous studies. • The causality of temporal correlations. Time series data always present strong temporal causality [5], which implies that the evolution within any tiny time intervals should meet such an abstract but actually existed temporal causality. Because the causality is hard to be quantified, it will inject extra complexity in TSRF. • The dependencies of spatial correlations. Although the TSRF is conducted in the temporal dimension, the traffic flow evolution also has spatial correlations, which should be simultaneously modeled in the forecasting. Fortunately, the idea of neural ordinary differential equations (neural ODEs) [6] points out that the traffic flow evolution within tiny time intervals is not untraceable. We model the change rate for traffic flow and perform the definite integral for the change rate thus forecasting the traffic flow at arbitrary time steps due to the continuous nature of ordinary differential equations. The solution is shown as Fig. 1B.
The urban road is usually modeled as a network, we use continuous-time network dynamics to describe the change rate. Previous studies confirm that traffic flow of adjacent vertices in road networks are typically similar to each other because vehicles traverse between them frequently [7,8], which can be recognized as the spatial local stationary property in statistics [9]. In light of the traffic flow will spontaneously transfer with corresponding transition probabilities to the adjacent vertices, the network dynamics can be quantified as a transition process on the road network, which will be divided into two parts: (1) real-time inference of transition probabilities for traffic flow and (2) calculation of traffic flow transition volume based on transition probabilities. Intuitively, we take Fig. 1B as an example, the traffic flow of vertex A continuously transfer to vertex B with transition probability p AB (t) at time step t, causing the traffic flow to increase in B but decrease in A. Therefore, we can model the traffic flow evolution with a tiny time interval in terms of traffic flow transitions, to better capture the spatial correlations.
By incorporating the concept of continuous network dynamics, we can well tackle the three challenges mentioned above. Firstly, the continuous nature of network dynamics can help us to infer the traffic flow at arbitrary desired time steps. Secondly, because of the additivity of definite integrals, subsequent states are completely determined by previous states, so that the temporal causality can be well preserved. Lastly, we specify the network dynamics as the form of GNNs to capture the complex spatial correlations due to the message passing mechanism of GNNs [10]. Based on the above, we design a model, Spatial-Temporal Continuous Dynamics Network (STCDN). 1 In STCDN, we model the continuous-time network dynamics for traffic flow on the road network and forecast future instantaneous traffic flow at arbitrary desired time steps. Experiments on four public traffic flow datasets illustrate that our model can not only achieve high accuracy on temporal super-resolution traffic flow forecasting, but also outperforms other baselines on conventional TERF tasks.
In general, the major contributions can be summarized as follows: • We identify a common but under-explored issue, that existing traffic forecasting studies can only be conducted at certain recording time steps and fail to perform temporal superresolution forecasting Therefore, we propose Spatial-Temporal Continuous Dynamics Network (STCDN) to forecast the traffic flow at arbitrary time steps, rather than these certain recording time steps. We call such forecasting paradigm as temporal super-resolution forecasting (TSRF). • To this end, we incorporate the idea of neural ordinary differential equations (neural ODEs) to model change rate of traffic flow, and to forecast the traffic flow at arbitrary time step via definite integrals. Considering the urban road is usually recognized as a network, we specify the change rate of traffic flow as continuous-time network dynamics. • Extensive experiments illustrate that in temporal super-resolution forecasting tasks, comparing with an intuitive solution, i.e., incorporating interpolation algorithms, our model, respectively, achieves averaged 8.0% performance improvements. Meanwhile, in conventional experimental settings, our model also outperforms other baselines evaluated by 10 out of 12 metrics with averaged 2.60% improvements.

Traffic flow forecasting
Traffic flow forecasting is a typical spatial-temporal modeling task that is important for urban computing and is of great significance for the construction of smart cities. Earlier studies focus on traditional statistical methods to analyze univariate time series, the representatives include historical average (HA), vector auto-regression (VAR) [11,12], auto-regressive integrated moving average (ARIMA) [13,14], and support vector regression machines (SVR) [15], etc. These shallow methods only capture the temporal dependencies and simplify the traffic flow modeling into individual time-series forecasting, the preconditions for these methods to capture complex spatial-temporal correlations are sophisticated and manually designed feature engineering. With deep neural networks proving their superiority of powerful representation ability, subsequent studies leveraged neural networks to complete more accurate modeling [16][17][18], which completely ignores the spatial dependencies. Accordingly, convolutional neural networks (CNNs) are utilized to model spatial correlations of rasterized road networks [19][20][21][22]. Meanwhile, the temporal information can be handled by sequential models, like recurrent neural networks (RNNs) [20,23] or temporal convolutional networks (TCNs) [24,25].
However, CNN-based methods can only be applied to Euclidean data. Therefore, graph neural networks (GNNs) are incorporated to tackle massive non-Euclidean spatial data, named spatial-temporal graph neural networks [26]. Some graph-based methods adopt prior knowledge to construct a graph structure via a pre-defined adjacency matrix [23,24,27,28], in which the graph structure remains constant because its information has been determined by prior knowledge. In order to boost the representation ability of graph neural networks, various auxiliary adjacency matrices are introduced to describe spatial relationships from different aspects, such as DTW distances [29,30] for measuring the feature relevance of vertices, or other specific functional relevance (e.g., POI) [20,31]. Recently, some studies have adopted entirely data-driven optimizable semantic adjacency matrices [25,32,33] to capture latent correlations among vertices. The spatial information and temporal information will be further integrated and fused in different ways, such as stacking [24,27,34], embedding [23,25] or synchronization [28].
Recently, some studies introduce neural ODEs to to obtain spatial-temporal hidden states with continuous depth, thus greatly improving the representation ability of the model [30, [35][36][37]. Nonetheless, these studies use neural ODE as a tool to solve their confronted problems, for instance, [30] and [36] are to use the idea of neural ODE to solve the problem of spatially unreachable information at remote nodes, and to alleviate the over-smoothing issue for the spatial information; [35] uses neural ODE to balance the forecast ing precision with the computational cost; and [37] introduces neural ODE to handle irregular time series. Although these studies use neural ODE to solve well-motivated and achieve good results, these studies can not handle the temporal super-resolution forecasting.

Super-resolution reconstruction
Super-resolution reconstruction was first widely studied in computer vision. Researchers aim to reconstruct relatively higher resolution images based on lower resolution images, referred to as super-resolution reconstruction [38,39]. Earlier studies adopt interpolation algorithms [40,41] to reconstruct fine-grained information on images and obtain high-resolution images. However, Super-resolution reconstruction is an inherently ill-posed problem, there always exist multiple high-resolution images corresponding to one original low-resolution image. Therefore, learned-based methods are incorporated to reconstruct super-resolution images with richer semantic information [42,43]. Subsequently, several studies referred to the idea of super-resolution reconstruction to reconstruct fine-grained information of MRI information [44], crowd flow information [45][46][47], and radio map information [48,49], etc. Nonetheless, these studies all focus on fine-grained spatial information reconstruction. Very few studies focus on temporal information reconstriction and temporal super-resolution forecasting. A similar task is missing value imputation of time series [50][51][52]. Nonetheless, the missing value imputation task has fundamental differences with our task: (1) Task difference. The former is an interpolation task, which aims to impute the original incomplete time series data. In our task, it is an extrapolation task, aiming to make the forecasting intervals independent of recording intervals; (2) Data difference. The former is usually adopted to tackle corrupted data, which might be caused by device failures and human errors, etc. In our task, the data could be corrupted or not, the coarse-grained and incomplete nature of which is mainly caused by the inherent recording limitations. (3) Purpose difference. The former is usually adopted to repair corrupted data, and our task is adopted to forecast future traffic flow at flexible time steps.

Definition (traffic network)
Let G = (V , E, A) denotes a traffic network, with the set of vertices V (sensing devices), and the set of edges E (geographical or semantic connections). |V | = n represents the graph G contains n vertices. A ∈ R n×n denotes the adjacency matrix of G. This paper will learn the adjacency matrix A with an end-to-end manner. Additionally, at the time step t, there are features of vertices X(t) ∈ R n×i , where i is the original input feature dimension.

Temporal super-resolution traffic flow forecasting (TSRF)
The temporal super-resolution traffic flow forecasting task (TSRF) is introduced to forecast traffic flow at arbitrary desired time steps.
Formally, given the traffic network G and h historical observations from the initial time step t −h to the terminal time step t −1 : our target is to find a mapping function F (t, ·) to forecast the traffic flow at an arbitrary time step t. Here, t can be any time step that satisfies t ≥ t 0 , where t 0 denotes the initial time step of forecasting. We should minimize the error between the forecasted traffic flow and the supervision signals where L is the objective function, denotes all trainable parameters.
are lists of values, we stack it as matrices to facilitate parallelization. Note that we will use negative values to denote the time steps with historical observations, while t 0 denotes the initial time step of forecasting.

Temporal equal-resolution traffic flow forecasting (TERF)
In order to correspond to the aforementioned TSRF, we introduce the temporal equalresolution traffic flow forecasting task (TERF) to represent the conventional traffic flow forecasting, where the forecasting intervals equal to the recording intervals. Formally, given the traffic network G and h historical observations from the initial time step t −h to the terminal time step t −1 : , · · · , X(t −1 )], our target is to find a mapping function F (·) to forecast the next q-step traffic flow from the time step t 0 to the terminal t q−1 as Y(t 0 : t q−1 ), and minimize the error between the forecasted traffic flow and the supervision signals X (t 0 : t q−1 ): where The Problem 1 is what our model should tackle, and the Problem 2 is the conventional problem definition of previous studies. The difference between the both is if we can flexibly select the forecasted time step t. In the former, t can be any time step that satisfies t ≥ t 0 , while the forecasted time step t are fixed in the latter. Actually, the latter is a subproblem of the former, which implies that one can solve the former will certainly solve the latter, and not vice versa.

An overall solution
From the continuous-time dynamical system view, the continuous depth of neural networks is equivalent to continuous physical time [6,53,54]. Meanwhile, the discrete layer of neural networks can be converted as a continuous one in the neural ODEs, also can be recognized as the change rate of features at each moment [6]. Therefore, we impose the continuous depth on network dynamics via neural ODEs, enabling it to represent continuous physical time.
Firstly, for better representation ability, we will use a fully connected layer to map the original input traffic flow feature X(t) ∈ R n×i at time step t into high-dimensional hidden space H(t) ∈ R n×d with hidden dimension d. At time step t, the traffic flow hidden state is denoted as H(t) ∈ R n×d , we formulate the network dynamics of traffic flow on road network as a continuous-time function f (t, H(t)) over time t. The network dynamics should be the form of ordinary differential equation [6]: where denotes all trainable parameters. Essentially, the network dynamics is the derivative function of hidden state H(t) over time t. Under the circumstances of our task, it can also be interpreted as the instantaneous rate of change of traffic flow. By integrating the Eq. (3) over time t from an initial hidden state H(t 0 ) ∈ R n×d at time step t 0 , Eq. (3) is actually equivalent to solving an initial value problem [55]. We can infer the continuous-time instantaneous hidden state H(t) at an arbitrary time step t ≥ t 0 via the definite integral with variable upper bound: Therefore, from the Eq. (4), we can model the traffic flow on road network as a constant coefficient dynamical system with parameter sharing over time. The larger upper bound t is, the "deeper" the neural networks is, and in physical meaning, the longer the evolution time-consuming of traffic flow on the road network is. Meanwhile, because the upper bound t can be arbitrarily selected, as long as it satisfies t ≥ t 0 , we can infer the traffic flow at arbitrary time steps regardless of the recording intervals, realizing the TSRF task. Also, the parameter sharing nature provides the opportunity to overall optimize the network dynamics by minimizing the loss on partial time steps. We will use a self-attention-based graph neural network to specify the network dynamics. Intuitively, in the context of traffic flow on road network, the above idea can be interpreted as a straightforward phenomenon that the ceaseless traffic flow transition in the road network will trigger the evolution of traffic flow over time. The intuition is exactly meets the conventional physical-guided traffic flow theories [7,8,56].

Overview
The overview of spatial-temporal continuous dynamics network (STCDN) is shown as Fig. 2. From the illustration, we can see that STCDN is a typical encoder-decoder model [57]. The most intuitive characteristic is our model will generate a set of continuous dotted curves via the definite integral of the network dynamics, which exactly represent the continuous evolution of traffic networks in which we will use a graph neural network to parameterize the network dynamics thanks to its message passing nature in graphs [10,53]. According to the definition of ordinary differential equations [6,58], the parameterized network dynamics is equivalent to the change rate with respect to the temporal dimension of the traffic network. Then, the network dynamics will perform definite integral to form the evolution with continuous patterns, and finally be, respectively, coupled with the encoder and decoder of STCDN (Fig. 3).

Network dynamics
Firstly, we first specify the network dynamics, denoted as f (·). For better simulate the traffic flow transition on the traffic network, considering the network dynamics are equivalent to the change rate of traffic flow, the network dynamics should contain two processes: (1) real-time inference of transition probabilities for traffic flow and (2) calculation of traffic flow transition volume based on transition probabilities. Therefore, we design a based graph neural network to specify the network dynamics.
Firstly, infer the real-time transition probabilities for traffic flow. We use hidden state H(t) ∈ R n×d to describe the traffic flow state on the traffic network at the time step t. Give the hidden state H(t) and the adjacency matrix A at time step t, we denote the transition probabilities as a transition matrix M(t) ∈ R n×n : where Q2 , Fig. 3 The overview of spatial-temporal continuous dynamics network Q1 , Q2 , K ∈ R d×d are trainable parameters, • denotes Hadamard product. The purpose of the simultaneous existence of Q 1 (t) and Q 2 (t) is to obtain the representations as the roles of target and source in a directed graph, respectively. Filter(·) is a newly defined function to set 0-value entries as the negative infinity, ensuring the transition probabilities of non-neighbor are always 0 after the Softmax(·) (Fig. 4).
After obtaining the transition probabilities matrix, the next task is to calculate the traffic flow transition volume C(t) ∈ R n×d based on transition probabilities: where Combining Eqs. (5), (6), (7), and (8), the network dynamics based on graph neural networks can be written as the following fully expanded form: Similar to other self-attention-based algorithms [59,60], we impose multi-head operation on the network dynamics to reduce parameters and enhance the representation ability. Fig. 4 The illustration of the self-attention-based network dynamics

Encoder
Theoretically, if we have the initial hidden state H(t 0 ), we can inference the subsequent continuous-time hidden state via the Eq. (4). This implies that the initial hidden state should contain rich semantic information. Therefore, we design an encoder to incorporate the information from historical observations. Formally, given the historical observations , · · · , X(t −1 )] from the from the initial time step t −h to the terminal time step t −1 , where X (t −h : t −1 ) ∈ R h×n×i . We firstly use a fully connected layer to map it into a hidden space: We take H(t −h ) as the initial state to solve the initial value problem until the integral upper bound t −h+1 via the network dynamics f E (t, H(t)), and obtain the hidden state at t −h+1 : where E denotes all trainable parameters in the encoder, which implies the parameters in the encoder and the decoder are not shared. After obtaining the H(t −h+1 ) ∈ R n×d , there are historical observations X(t −h+1 ) ∈ R n×i that can be used to complement the hidden state, reducing the errors accumulation. We will use a linear combination to fuse the hidden state H(t −h+1 ) and the historical observations X(t −h+1 ): where FC(·) denotes a fully connected layer with activation function. The above process will be performed repeatedly, until all historical observations are encoded into an intermediate hidden state H ∈ R n×d . Although the TSRF will not be performed in the encoder, we still encode the network dynamics, since we observe the network dynamics can be interpreted as a feature augmentation, enabling the model to achieve better forecasting performance. The hypothesis will be confirmed in the ablation study part.

Decoder
The decoder takes the hidden state H as the initial information. Because no any complementation information should be incorporated, we directly integrate the network dynamics f D (t, H(t)) to forecast the traffic flow Y(t) ∈ R n×i at the time step t: Analogously, D denotes all trainable parameters in the decoder.

Optimization
STCDN provides an end-to-end manner to optimize. Formally, given the ground-truth from t 0 to t q−1 , X (t 0 : t q−1 ), as supervision signals, together with the set of time step where supervision signals locate in: T = [t 0 , t 1 , · · · , t q−1 ]. We use Mean Absolute Errors (MAE) as the loss function: where the latter term λ|| || 2 denotes L2 regularization for avoiding over-fitting. We will only extract the forecasting results at time step that contained in T to contrast the ground-truth and optimize. Because the network dynamics share the same parameters, we optimize the model according to the forecasting at these certain time step, forecasting at arbitrary time step will also be adjusted simultaneously, since these forecasting are all generated by the network dynamics. From this perspective, the TSRF can be regarded as a semi-supervised learning task.

Numerical integration
We solve the initial value problem shown in Eqs. (4), (11), and (13) by numerical integration methods, such as Euler method, Runge-Kutta method, or Dormand-Prince method [61]. These numerical integration methods can infer the continuous-time instantaneous hidden state that determined by network dynamics.

Adjacency matrix
Instead of using geographical adjacency matrix that generated by prior geographical relationships, we adopt the trainable adaptive adjacency matrix to obtain the semantic relationships [25,32,33], which has been proved that can achieve better performance: where M 1 , M 2 ∈ R n×n are trainable parameter matrices with n n, initialized by Xavier method [62]. σ (·) is the ReLU activation function. The T anh(·) function here is to constrain the entries within 0 to 1 (together with the ReLU (·) function). T opK (·) function means that we will only retain a certain percentage of edges according to the weight, and other edges will be removed, the purpose of which is to ensure the sparsity of the adjacency matrix.

Computational complexity
The computational complexity (both time and space complexity) is related to the used numerical integration algorithm, and different numerical integration algorithms have different computational complexity. Specifically, widely used numerical integration algorithms in neural ordinary differential equations [6], have a significant parameter: the integration step (equivalent to the super-resolution multiplier in our method), which determines the computational complexity. In our method, the computational complexity will be slightly higher than that of RNN (O(L) for L-length time series) due to the discrete approximation nature of numerical integration algorithms, but all present a linear complexity with respect to the number of time steps, i.e., O(k L) for L-length time series and 1 k integration step.

Evaluation
In this section, we conduct extensive experiments on four real-world datasets to answer the following questions: • Q1. How does the STCDN performs in the temporal super-resolution forecasting (TSRF) tasks? • Q2. How does the STCDN performs in the conventional temporal equal-resolution forecasting (TERF) tasks? • Q3. What the performance tendency of TSRF tasks is as the forecasting resolution magnification increases? • Q4. Does encoding network dynamics in the encoder improve the forecasting performance?

Datasets
We will conduct conventional TERF on PeMSD3, PeMSD4, PeMSD7, and PeMSD8, TSRF and other experiments will be conducted on PeMSD4 and PeMSD8. Necessary information of datasets is given in Table 1. All datasets record the information of traffic flow every 5 min. These datasets are the benchmarks used in many existing studies [27-30, 63, 64]. According to these existing studies, for fairness, we adopt general solutions to preprocess these datasets. Firstly, we utilize the Z-score normalization to the input information X : where mean(·) and std(·) are the mean value and the standard deviation of the input information, respectively. All datasets aggregate records into 5-min interval, and 288 time steps per day. In addition, following these studies, we split the training set, validation set, and test set for these datasets according to the chronological order by the corresponding ratio, i.e., 60% for training set, 20% for validation set and 20% for test set.

Basic experimental introduction
We design two types of basic experiments to answer the Q1 and Q2, respectively.

Experimental introduction of TSRF
Performing temporal super-resolution forecasting (TSRF) is one of the most important motivations for this paper. Nonetheless, evaluating the quality of TSRF is a problem, because the purpose of TSRF is to forecast at time steps where there is no supervision signal, but without a supervised signal we cannot evaluate the forecasting performance of the model. To bridge the gap, We use upsampling to generate a new set of data with reduced resolution from the original data. By doing so, we can train the model with the set of low-resolution data, while the original data are still high-resolution compared to the training data, so we can use this high-resolution original data to evaluate our temporal super-resolution forecasting. Specifically, in this part, we reduce the resolution by a factor of k = 3 for training, which means that the resolution is in fact magnified by a factor of 3 for evaluating the TSRF.

Experimental introduction of TERF
The TERF task is a conventional task that widely conducted in previous studies [27-30, 63, 64]. We refer to the experimental settings in previous studies for fairness, using past 12 time steps of historical observations to forecast traffic flow of future 12 time steps, in which the recording intervals are 5 min.

Baselines
Up to our knowledge, there is no spatial-temporal modeling algorithm that dedicated to independently perform TSRF tasks. We divide all baselines into basic baseline models and interpolation models. The former class can complete TERF. We introduce a compromised way, imposing interpolation models on the forecasting results of these basic baseline models to simulate TSRF tasks.

Basic baseline models
• Vector auto-regression (VAR) [11], a time series model to capture the pairwise relationships among time series; • Long short-term memory (LSTM) [65], a classical variant of recurrent neural networks (RNNs) for time series; • Diffusion convolutional recurrent neural networks (DCRNN) [23], in which the spatial dependencies are captured by random walks, and the temporal dependencies are captured by RNNs; • Spatial temporal graph convolution networks (STGCNs) [24], which formulates the problem on graphs and builds the model with complete convolutional structures; • Graph WaveNet (GWN) [25], a framework for deep spatial-temporal graph modeling, which applies a learnable adaptive adjacency matrix to capture the hidden spatial dependency; • Attention-based spatial-temporal graph convolutional networks (ASTGCN) [27], which designs spatial attention and temporal attention mechanisms to model spatial and temporal dynamics, respectively; • Spatial-temporal synchronous graph convolutional networks (STSGCN) [28], which synchronously captures the spatial and temporal information by stacked graph convolutional neural networks. • Spatial-temporal fusion graph neural networks (STFGNN) [29], which adopt a datadriven temporal graph to compensate several existing correlations that spatial graph may not reflect. • Spatial-temporal graph ODE networks (STGODE) [30], an model that captures spatialtemporal dynamics through a tensor-based ordinary differential equation.

Interpolation models
• Lagrange interpolation (LAG), an interpolation algorithm based on polynomials [66]; • Slinear interpolation (SLI), a spline interpolation of first order; • Quadratic interpolation (QUA), a spline interpolation of second order; • Cubic interpolation (CUB), a spline interpolation of third order [67]; • Linear interpolation (LIN), an interpolation algorithm that the interpolation function is a polynomial of the first degree; • Nearest interpolation (NEA), an interpolation algorithm that selects the nearest information to perform interpolating.
Performing interpolating requires sampling points, which should be coarse-grained. These coarse-grained sampling points should be provided by other forecasting algorithms, referred to as basic model. In this paper, we select DCRNN [23], STGCN [24], Graph WaveNet [25], ASTGCN [27], and STGODE [30] as basic models and perform interpolating based on the forecasting results of these models.

Implementation settings
Our experiments were conducted on the computer environments with Tesla V100 GPU cards. We implement our algorithm by PyTorch. Batch size is set as 32, the number of attention heads (Z ) is 8, hidden dimension in our algorithm (d) is 128, learning rate is 0.0003. We incorporate Adam optimizer [68] to train our model. In the adjacency matrix, we retain 7.5% edges with the largest weight, and remove others. In computation, we select the 5-order Dormand-Prince method [61] for numerical integration. Notability, the numerical integration methods can be selected arbitrarily, e.g., Euler method, Runge-Kutta method, etc.

Evaluation metrics
Mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage errors (MAPE) are used as the evaluation metrics. These metrics are defined as: where X = [X(t 0 ), X(t 1 ), · · · , X(t q−1 )] and Y = [Y(t 0 ), Y(t 1 ), · · · , Y(t q−1 )] denote the ground-truth and predictions with q-length, respectively. T = [t 0 , t 1 , · · · , t q−1 ] s the set of time steps where supervision signals locate in. σ denotes a tiny shift to prevent the denominator equals zero. Experiments on all datasets are conducted at least 5 times with different random seeds, and the shown metrics are the mean value all experiments. Note that we prefer to refer to the experimental results of baselines given by authors if any. Otherwise, we tune the hyperparameters of baselines carefully, detailed settings of hyperparameters for baselines are given on Github.

Performance comparison of TSRF
The detailed comparison of TSRF is given in Table 2. Note that the comparison shown in the Table 2 is under the forecasting resolution magnification k = 3, other forecasting resolution magnifications will be analyzed in the subsequent section.
From the comparison, we can see that STCDN presents the best performance on TSRF tasks. Concretely, comparing STCDN with the second best performance, STCDN achieves averaged 7.58% and 8.40% performance improvements on the two datasets, respectively. Such a superiority mainly comes from the fundamental difference between STCDN and other basic baseline models.
Interpolation-based methods overall exhibit less performance. On one hand, the performance of these interpolation-based methods is highly dependent on the basic baselines that they rely on. Moreover, basic baselines that perform well on conventional TERF task are not consistent with the TSRF task. For instance, GWN can perform better on TSRF task when it is considered as a basic baseline, but it failed to present competitivity in TERF task as shown in Table 3. This results in selecting a ideal basic baseline a difficult and tricky affair. Meanwhile, since different interpolation strategies take into account different orders, the performance of these methods also dependent on the selecting of interpolation strategies. The above two items make the construction of interpolation-based methods more like a time-consuming permutation and combination when performing the TSRF task. Even so, this approach did not achieve the desired performance.
In contrast, STCDN considers the traffic flow on road network as a physical-guided way, i.e., continuous-time dynamical system. Whether classical physics-guided traditional theories [56], or human intuitive perception of traffic flow transitions [7,8] have confirmed that comparing with conventional spatial-temporal models that directly fusing spatial information and temporal information [27-30, 63, 64], our proposed continuous-time transition of traffic flow can better reflect the nature of the road network as a complex physical system. Meanwhile, such a continuous-time nature can also more accurately approximate the real traffic flow at arbitrary time steps in a more natural mean.

Performance comparison of TERF
The performance comparison of TERF is given in Table 3, which has the conventional experimental settings. The comparison shows that our proposed model overall outperforms other baselines in four public traffic flow datasets on 10 out of 12 metrics with averaged 2.60% improvements. As we mentioned above, the modeling ideas of STCDN are fundamentally different the idea of directly fusing spatial information and temporal information, which more meets the natural evolution of traffic flow. Additionally, in the numerical integration methods, there will be a lot of temporary hidden states are generated, which can be regarded as feature augmentation. Therefore, we hypothesize that such a performance superiority mainly comes from the augmented features provided by the generated hidden states, which will be verified in the subsequent ablation studies. Bold fonts denote the best performance and underlines denote the second best performance Bold fonts denote the best performance and underlines denote the second best performance in the both Table  Among these baselines, STFGNN [29] and STGODE [30] achieve the second best performance. The both, utilized DTW-augmented graphs to complete the feature augmentation. Therefore, the performance superiority of these two algorithms mainly comes from more powerful feature engineering. Interestingly, in previous comparison in TSRF, GWN-based algorithms achieve strong performance with interpolation, but it do not perform well in TERF tasks. This also confirms the randomness and uncertainty of selecting the combination of basic baselines and interpolations when performing TSRF. Also, the experiment illustrates that STCDN can not only perform excellently in TSRF with significant performance improvements, but also present strong competitivity in conventional TERF tasks. Notability, our model significantly outperforms all baselines in metrics MAE and RMSE, but failed to achieve the best in MAPE. This is because of we modeled network dynamics is essentially a first-order ordinary differential equation mathematically. If only the firstorder terms are considered, without incorporating higher-order terms, the modeled evolution trajactories of traffic flow will be smooth. Such a smooth nature will result in excellent average forecasting performance for all time steps, but tends to be insensitive to mutations or jumps. This will be the future direction of our subsequent studies.

Parameter sensitivity analysis
To answer Q3, we will discuss the influence of forecasting resolution magnification k on the accuracy of TSRF.
Based on the settings in the TSRF tasks, we will discuss the influence of forecasting resolution magnification k ranges from 2 to 6 on PeMSD4 and PeMSD8 datasets, i.e., correspond to the cases of forecasting intervals of 150 s to 50 s, respectively, under the 5-min recording interval. As a comparison, we also illustrate the TSRF performance of basic models with  Bold fonts denote the best performance and underlines denote the second best performance in the both Table   Slinear interpolation algorithms, which presents the overall best TSRF performance among all baselines. The TSRF comparison under different resolution magnification is shown as Fig. 5. Generally, the comparison illustrates that with the resolution magnification increases, the forecasting performance of all algorithms are getting worse, which is intuitive. Because a larger resolution magnification implies the more information should be forecasted, while the less information we known, thereby increasing the difficulty of accurate forecasting. Additionally, in our settings, we expand the recording intervals to simulate the case of lager intervals, which also leads to numerical increasement. But this does not prevent us from comparing the relative performance among models.
In the comparison, we can see that our model consistently achieves the best performance among these algorithms in the task. Meanwhile, we can see with the forecasting resolution magnification k increases, the performance degradation of our algorithm is minimal overall. The experiment illustrates that even in the case of sparse recording data to be the supervision signals, STCDN can still perform accurate TSRF compared to other baselines.

Ablation study
In this subsection, to answer the Q4, we will analyze how the existence of continuous-time hidden states that are generated by integrating the network dynamics affects forecasting performance.
To this end, we no longer regard the network dynamics, shown in Eq. (10), as dynamical information, but as a simple spatial information extractor. Therefore, we will not perform a definite integral in the temporal dimension. Instead, we incorporate the idea of DCRNN [23] that using an RNN block to handle the temporal information. The ablation model is referred to as STCDN(w/o dynamics). The experiment will conducted on all datasets, and all evaluation metrics are obtained via TERF tasks. Thus, STCDN follows the conventional strategy that fusing spatial and temporal information. Results are shown in Table 4.
From the comparison, we can see that the continuous-time hidden states from the view of network dynamics can enhance the forecasting performance with 4.50% improvements. We hypothesize such superiority can be explained from two aspects: (1) the continuous-time hidden states that are generated by the physical-guided way can better approximate the nature of the continuous evolution of traffic flow on the road network, enabling STCDN to have a better representation ability; (2) the continuous-time hidden states can also be regarded as a way of feature augmentation to improve forecasting performance.

Conclusions and future work
In this paper, we notice that the widely applied recording way in traffic flow might result in the coarse-grained and incomplete nature of supervision signals, leading to the forecasting intervals being strictly limited by recording intervals. Therefore, we propose a new task named Temporal Super-Resolution Traffic Flow Forecasting to help the forecasting intervals eliminate the limitations of recording intervals. Specifically, we regard the traffic flow on the road network as a continuous-time dynamical system. By modeling the network dynamics and incorporating the idea of ordinary differential equations, we can model the continuous-time hidden states, and further infer the traffic flow at arbitrary desired time steps. This is a novel attempt at making the forecasting intervals independent of the recording intervals, enabling traffic forecasting with more flexible intervals. Theoretically, such a continuous-time dynamical system based idea can be expanded to any domain that is related to time series, not just traffic flow forecasting. Therefore, it can also benefit more potential applications. On the other hand, we also need to confront some problems due to the immaturity of this solution for the novel task, such as insensitivity of mutations or jumps, and time-consuming. In the future, we will work to solve these problems and improve the idea to better tackle such a novel problem.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Yi Xie is currently working toward the Ph.D. degree in software engineering at Fudan University, Shanghai, China. His research interests include spatial-temporal data mining and time series modeling.
Yun Xiong received the Ph.D. degree in computer and software theory from Fudan University. She is a professor of computer science at Fudan University, Shanghai, China. Her research interests include database and data mining. Jiawei Zhang received the bachelor's degree in computer science from Nanjing University, China, in 2012, and the Ph.D. degree in computer science from the University of Illinois at Chicago, in 2017. He has been an assistant professor with the Department of Computer Science, Florida State University, Tallahassee, FL, since 2017. He founded IFM Lab in 2017, and has been working as the director since then. IFM Lab is a research oriented academic laboratory, providing the latest information on fusion learning and data mining research works and application tools to both academia and industry.