In order to obtain sufficient time dimension information and reduce the cost of processing irrelevant historical information, this paper proposes the overall framework of the multi-component spatial-temporal graph convolution (MCSGCN) network. Which is used to describe the recent characteristics, daily cycle characteristics and weekly cycle characteristics of the predicted target (GCN: spatial dimensional graph convolution; Conv: temporal dimensional convolution; FC: fully connected). For example, to predict the traffic volume of a certain highway section at 10:00, the traffic data within 1 hour before 10:00 (9:00–10:00), the traffic data corresponding to 1 day ago and 1 week ago at 10:00 will provide useful information for the prediction problem.
Suppose that the sampling frequency is q times a day, that is, q data points are included in the time series every day. It is supposed that the current moment is t0, the length of the time window to be predicted is Tp, and three time series fragments of length Th, Td, Tw are intercepted along the time axis as the input of the three components of the model's recent, daily cycles and weekly cycles, as shown in Fig. 2 (suppose the prediction period length Tp is 1 hour, Th, Td and Tw are all 2 times of Tp), where Th, Td and Tw are all integer multiples of Tp, and the three time series fragments are as follows:
(1) Recent clips
\({\chi _h}=({X_{{t_0} - {T_h}+1}},{X_{{t_0} - {T_h}+2}},...,{X_{{t_0}}}) \in {{\mathbb{R}}^{F \times N \times {T_h}}}\) , that is, a historical time series segment directly adjacent to the forecast period, as shown in the green part in Fig. 2.
(2) Daily cycle fragment
$${\chi _d}=({X_{{t_0} - ({T_d}/{T_p}) \times q+1}},...,{X_{{t_0} - ({T_d}/{T_p}) \times q+{T_p}}},{X_{{t_0} - ({T_d}/{T_p} - 1) \times q+1}},...,{X_{{t_0} - ({T_d}/{T_p} - 1) \times q+{T_p}}},...,{X_{{t_0} - q+1}},...,{X_{{t_0} - q+{T_p}}}) \in {{\mathbb{R}}^{F \times N \times {T_d}}}$$
7
It is composed of sequence fragments that are the same as the prediction target period several days before the prediction period, as shown in the red part in Fig. 2. Due to the influence of morning and evening peak cycles, people's daily routines and other factors, traffic data has a strong similarity at the same time of the day. The purpose of the daily cycle component is to model the cycle pattern of traffic data in days.
(3) Weekly cycles fragment
\({\chi _w}=({X_{{t_0} - 7 \times ({T_w}/{T_p}) \times q+1}},...,{X_{{t_0} - 7 \times ({T_w}/{T_p}) \times q+{T_p}}},{X_{{t_0} - 7 \times ({T_w}/{T_p} - 1) \times q+1}},...,{X_{{t_0} - 7 \times ({T_w}/{T_p} - 1) \times q+{T_p}}},...,{X_{{t_0} - 7 \times q+1}},...,{X_{{t_0} - 7 \times q+{T_p}}}) \in {{\mathbb{R}}^{F \times N \times {T_w}}}\)
It is composed of sequence fragments with the same attributes and the same period as the prediction target week in the weeks before the prediction period, as shown in the blue part in Fig. 2. There are obvious weekly cycle patterns in traffic flow data. For example, the traffic pattern on Monday is similar to the traffic pattern on Monday in history, but may be different from the traffic pattern on weekends. The purpose of the weekly cycle component is to model the changing law of the traffic data with a weekly cycle.
The three components share the same network structure and consist of a spatial-temporal convolution module (Fig. 3) and a fully connected layer. Among them, the spatial-temporal convolution module includes two parts: the graph convolution in the spatial dimension and the standard 2-dimensional convolution in the time dimension. The model finally fuses the output results of the three components based on the parameter matrix to obtain the final prediction result. This structure can effectively capture the time and space characteristics of traffic data and its spatial-temporal correlation.
3.1 Spatial dimensional graph convolution
The model adopts the spectrogram method (Briand et al. 2017) to extend the convolution operation to graph structured data, treats the data as signals on the graph, and then processes the graph signals directly on the graph to capture meaningful patterns and features in the space. The spectrogram method mainly analyzes the graph structure by transforming the graph into an algebraic form. In this paper, the connection relation and mutual influence between nodes in the graph structure are used. In spectrogram theory, a graph is represented by its corresponding Laplace matrix, and the properties of the graph structure are obtained by analyzing the Laplace matrix and its characteristics. The Laplace matrix of the graph is defined as L = D-A, and its normalized form can be expressed as \(L={I_N} - {D^{ - \frac{1}{2}}}A{D^{ - \frac{1}{2}}} \in {{\mathbb{R}}^{N \times N}}\), where A is the adjacency matrix, IN is the identity matrix, and degree matrix \(D \in {{\mathbb{R}}^{N \times N}}\) is the diagonal matrix composed of node degrees, \({D_{ii}}=\sum\nolimits_{j} {{A_{ij}}}\). The Eigenvalue decomposition of the Laplacian matrix is \(L=U\Lambda {U^T}\), where \(\Lambda =diag([{\lambda _o},...,{\lambda _{N - 1}}]) \in {{\mathbb{R}}^{N \times N}}\) is the diagonal matrix of the eigenvalues of L, and U is the Fourier basis. Taking the traffic data at time t as an example, the graph signal is \(x={x^f} \in {{\mathbb{R}}^N}\), the Fourier transform of the graph signal can be expressed as \(\widehat{x}={U}^{T}x\). According to the properties of the Laplace matrix, U is an orthogonal matrix, and therefore the inverse Fourier transform \(x=U\widehat{x}.\) Graph convolution is a convolution operation achieved by using linear operators defined in the Fourier domain diagonalized to equivalently replace classical convolution operators, and the convolution kernel \({g_\theta }\) is used to carry out the convolution operation on graph G.
$${g_{\theta \times G}}x={g_\theta }(L)x={g_\theta }(U\Lambda {U^T})x=U{g_\theta }(\Lambda ){U^T}x$$
8
.
Since the convolution operation of the graph signal followed by the Fourier transform is equal to the product of the Fourier transform of these signals, the above equation can be understood as taking the Fourier transform of \({g_\theta }\) and x to the spectral domain respectively. Then multiply the results of the transformation of the two, and take the inverse Fourier transform to get the result of the convolution operation. Graph convolution is the transformation of a graph into a spectral domain to achieve convolution on a graph. However, when the scale of the graph is large, it is expensive to directly decompose the Eigenvalue of the Laplace matrix, so this paper adopts Chebyshev polynomial approximate expansion to solve this problem:
$${g_{\theta \times G}}x={g_\theta }(L)x \approx \sum\limits_{{k=0}}^{{K - 1}} {{\theta _k}{T_k}(\tilde {L})x}$$
9
\({\theta _k} \in {{\mathbb{R}}^K}\) is the Chebyshev polynomial coefficient.\(\tilde {L}=\frac{2}{{{\lambda _{\hbox{max} }}}}L - {I_N}\),\({\lambda _{\hbox{max} }}\) represents the maximum eigenvalue of the Laplace matrix. Recursive definition of Chebyshev polynomials:\({T_k}(x)=2x{T_{k - 1}}(x) - {T_{k - 2}}(x)\), in which,\({T_0}(x)=1,{T_1}(x)=x\). Using Chebyshev polynomial to expand the solution approximately is equivalent to use convolution kernel to extract the neighbor information of order 0 ~ K-1 around the center of each node in the graph.
Graph convolution module uses linear correction unit (ReLU) as activation function, that is\(\operatorname{Re} LU({g_{\theta \times G}}x)\).
Taking K = 2 as an example, the neighbor information of order 0 ~ 1 is extracted for each node, and the eigenvalues of the Laplacian matrix are scaled to make\({\lambda _{\hbox{max} }}=2\). The above graph convolution operation is expressed as:
$${g_{\theta \times G}}x \approx {\theta _0}{T_0}(\tilde {L})x+{\theta _1}{T_1}(\tilde {L})x={\theta _0}x+{\theta _1}(L - {I_N})x={\theta _0}x+{\theta _1}( - {D^{ - \frac{1}{2}}}A{D^{ - \frac{1}{2}}})x$$
10
In order to reduce the number of parameters, let\({\theta _0}= - {\theta _1}=\theta \in {\mathbb{R}}\), then\({g_{\theta \times G}}x \approx ({I_N}+{D^{ - \frac{1}{2}}}A{D^{ - \frac{1}{2}}})x\theta\), all nodes on the graph share convolution kernel weight \(\theta\).At the same time, to avoid numerical instability, gradient explosion or gradient disappearance, let \(\tilde {A}=A+{I_N},{\tilde {D}_{ii}}=\sum\nolimits_{j} {{{\tilde {A}}_{ij}}}\), then get \({g_{\theta \times G}}x={\tilde {D}^{ - \frac{1}{2}}}\tilde {A}{\tilde {D}^{ - \frac{1}{2}}}x\theta\) .Expanding to multidimensional data, the input data for layer r convolution is
$$\chi _{h}^{{(r - 1)}}=({X^{1,(r - 1)}},{X^{2,(r - 1)}},...,{X^{{C_{r - 1}},(r - 1)}}) \in {{\mathbb{R}}^{{C_{r - 1}} \times N \times {T_{r - 1}}}}$$
,
where\(r \in \{ 1,...,l\}\)(l is the number of space-time convolution layers); Cr−1 is the number of channels for input data of layer r network, when r = 1, C0 = F; Tr−1 is the time dimension length of the input data. When r = 1, the recent component T0 = Th (T0 = Td for daily periodic component, T0 = Tw for weekly periodic component). Performing graph convolution for \(\chi _{h}^{{(r - 1)}}\), it can be expressed as \({g_{\Theta \times G}}\chi _{h}^{{(r - 1)}}={\tilde {D}^{ - \frac{1}{2}}}\tilde {A}{\tilde {D}^{ - \frac{1}{2}}}\chi _{h}^{{(r - 1)}}\Theta\), in which, \(\Theta =({\Theta _1},{\Theta _2},...,{\Theta _{{C_r}}}) \in {{\mathbb{R}}^{1 \times {C_{r - 1}} \times {C_r}}}\) is the convolution kernel parameter.
3.2 Time dimensional graph convolution
After modeling the spatial features of the input data by spatial convolution operation, the temporal features are captured by standard 2-dimensional convolution. Using linear modified unit activation function to extract recent features as an example, getting \(\chi _{h}^{{(r)}}=ReLU(\psi \times (ReLU({g_{\Theta \times G}}\chi _{h}^{{(r - 1)}}))) \in {{\mathbb{R}}^{{C_r} \times N \times {T_r}}}\)(\(\psi\)is the parameter of the time-dimension convolution kernel), the specific convolution process is shown in Fig. 2.
After a layer of time dimension convolution, the information of the node is updated by the information of its adjacent time slices, and the information of the node and its adjacent time slices already contains the information of its adjacent nodes at the same time after the graph convolution operation. Therefore, after a layer of spatial-temporal convolution operation, the temporal and spatial dimension characteristics and spatial-temporal correlation of data will be captured. In this paper, multi-layer spatial-temporal convolution is used to extract the further information of spatial-temporal dimension, and then the results of spatial-temporal convolution are consistent with the predicted target dimension through full connection operation. The full connection module also uses linear correction unit as activation function.
3.3 Multi component fusion
For the fusion output of short-term, daily cycle and weekly cycle components, taking the flow prediction at 8:30 a.m. on Monday as an example, the law on morning and evening peak cycle of some highway sections is obvious, and the daily cycle and weekly cycle components have a great impact on the prediction results; some highway sections have no obvious traffic cycle law, and its daily cycle and weekly cycle components are of little help to the prediction target. Therefore, different nodes are affected differently by different components, and the degree of influence is also different when fusing the output results of different components. After learning the historical data, the final prediction result after fusion is shown as follows:
$$\hat {Y}={W_h} \odot {\hat {Y}_h}+{W_d} \odot {\hat {Y}_d}+{W_w} \odot {\hat {Y}_w}$$
11
In which,\(\odot\) is the Hadamard product multiplied by the corresponding elements of the matrix; wh, wd, ww are learning parameters, which reflect the influence of three time dimension characteristics of short-time, daily cycle and weekly cycle on the prediction target.