Robust-LSTM: a novel approach to short-traffic flow prediction based on signal decomposition

Intelligent transport systems need accurate short-term traffic flow forecasts. However, developing a robust short-term traffic flow forecasting approach is a challenging task due to the stochastic character of traffic flow. This study proposes a novel approach for short-term traffic flow prediction task, namely Robust Long Short Term Memory (R-LSTM) based on Robust Empirical Mode Decomposing (REDM) algorithm and Long Short Term Memory (LSTM). Short-term traffic flow data provided from the Caltrans Performance Measurement System (PeMS) database were used in the training and testing of the model. The dataset was composed of traffic data collected by 25 traffic detectors on different freeways’ main lanes. The time resolution of the dataset was set to 15 min, and the Hampel preprocessing algorithm was applied for outlier elimination. The R-LSTM predictions were compared with the state-of-the-art models, utilizing RMSE, MSE, and MAPE as performance criteria. Performance analyses for various periods show that R-LSTM is remarkably successful in all time periods. Moreover, developed model performance is significantly higher, especially during midday periods when traffic flow fluctuations are high. These results show that R-LSTM is a strong candidate for short-term traffic flow prediction, and can easily adapt to fluctuations in traffic flow. In addition, robust models for short-term predictions can be developed by applying the signal separation method to traffic flow data.


Introduction
ITS aims to increase the efficiency of transportation infrastructures by using technology and intelligent systems. Achieving this goal depends primarily on the success of gathering traffic information from the field on time. However, the information at the present time is not sufficient for most control decisions. Because ITS also requires information on future traffic for driver warning systems and various control decisions. Therefore, further studies are essential to develop advanced models that can predict traffic parameters accurately.
Traffic flow is a crucial parameter for road network control decisions. However, traffic flow is a random phenomenon, and developing a reliable short-term prediction model is a challenging task due to the randomness of the traffic flow. Over the past years, different approaches have been attempted to develop a robust short-term traffic model using various approaches, such as time series methods (Han et al. 2004;Zeng et al. 2008;Wang et al. 2017;Dogan 2018Dogan , 2020a, Support Vector Machine (SVM) (Zhao-sheng et al. 2006;Yang and Lu 2010;Zhang et al. 2011;Feng et al. 2018), genetic algorithm (Abdulhai et al. 2002;Vlahogianni et al. 2005;Xu et al. 2016). However, there is still a lack of a robust approach. Artificial Neural Network (ANN) is a promising tool for developing such an approach and has been used in many studies (Vlahogianni et al. 2005;Hu et al. 2008;Sun and Liu 2008;Dogan 2020b;Yao et al. 2020). However, the exploding and vanishing gradient problem caused by the structure of ANNs blocked the development of more effective approaches. Hochreiter and Urgen Schmidhuber (1997) have overcome this difficulty with the Long Short-Term Memory (LSTM).
Although LSTM is an effective approach for revealing patterns in a time series, the traffic flow signal on a section of road is an overlap of upstream signals, i.e., sub-signals, from different directions. Using this superposition signal directly has the potential to prevent the development of more robust prediction models. However, most of the approaches developed in recent years have proposed more complex and hybrid structures to detect patterns in the data set and have ignored this phenomenon (e.g., Xiao and Yin 2019;Yang et al. 2019;Lu et al. 2020;Zhaowei et al. 2020). These models provide successful results, but their complex nature makes them difficult to understand and apply to new problems. To overcome this problem, firstly, traffic flow data should be decomposed into sub-signals, and separate models should be developed for each signal. Thus, it will be easier to detect patterns in the flow, and the model complexity will be reduced.
This study aims to develop an approach that can adapt to different flow conditions by decomposing short-term traffic flow into sub-signals and presents the Robust-LSTM (R-LSTM) approach that combines the REMD method (enhanced version of EMD) and the LSTM deep network. In the literature, a few studies that use the decomposition method and LSTM together stand out. For example, searching the Northern Atlantic Emission Index (Yuan et al. 2019), web service proposal (Singh et al. 2019), and short-term rail passenger forecast (Chen and Li 2019). However, this approach has not been explored sufficiently for the traffic flow prediction problem. Yu et al. (2019) used the time series decomposition method in the deep learning network that includes an LSTM network. However, the model presented in the study has many methods and their relationships. Therefore, it is not clear that the reason for the high performance stated in the results is due to the decomposition process. In another study, the researchers decomposed traffic flow as periodic and volatility and also used LSTM and hybrid models to predict traffic flow . Analysis showed that decomposing approach has more accurate results. However, in the study, traffic flow is only separated by time series methods according to periodicity and volatility. However, in the study, traffic flow was only separated by time series methods according to periodicity and volatility. In this study, REMD, which is a signal processing method, is used instead of the time series method, and the signal is divided into sub-signals. Thus, a more accurate short-term traffic flow forecasting model is developed.
The main achievements, including contributions, may be summarized as follows: • A low-complexity methodology short-term traffic flow method, R-LTSM, was developed that combines REMD's accurate signal decompose and LSTMs accurate signal patterns capture ability. • The effectiveness of RLSTM has been proven by performing detailed analyses with a substantial amount of datasets and comparing R-LSTM performances with the latest models. • Analyses that contributed to the determination of appropriate IMF values for traffic flow signals were performed, and the results were discussed.
The remainder of the article is organized as follows: Information about the structure of REMD and R-LSTM is given in the methodology section. Descriptive statistics of the datasets and the model training parameters are discussed in the experimental setup section. Information about determining suitable models and comparisons with stateof-the-art models are in the experimental result section. The last section includes a discussion of the results and suggestions.

Traffic data pre-processes
Traffic data are collected by several types of detectors placed on roads. As a result of the malfunction of these devices, errors, deficiencies, or inconsistencies occur in the data. For this reason, the data should be pre-processed before proceeding with the model development.
Hampel filter has been effectively applied to studies in various areas (Allen 2009;Allen et al. 2010;Pearson et al. 2015;Ghaleb et al. 2018). Therefore, the Hampel filter was applied to the time-dependent traffic flow datasets before the decomposition process. The Hampel filter identifies an outlier in time series and replaces it with an appropriate value (Hampel 1974). Thus, the generalization capability and performance of the model developed by using these series increase.
Hampel filter processing steps include calculating local median Eq. (2) and standard deviation Eq. (3) values and determining outliers by using a threshold value (h).
Traffic data are collected by several types of detectors placed on roads. As a result of the malfunction of these devices, errors, deficiencies, or inconsistencies occur in the data. For this reason, the data should be pre-processed before proceeding with model development.
Let X t ð Þ ¼ x t 2 R þ jt N^N 2 N þ f gbe the main dataset and c be the sliding window length. Then, the local dataset, Finally, x j is tested using Eq. (4). If inequality is valid, then the Hampel filter declares x j as an outlier and replaces it with m j After the outliers were extracted, the datasets were standardized by using Eq. (5) for LSTM model training. LetX be mean and r be the standard deviation of the X; thus the standardized value of x j ,

Robust empirical mode decomposition
Empirical mode decomposition (EMD) was developed by Huang et al. (1998) for the decomposing of mixed signals and has been applied successfully to many theoretical and practical subjects. EMD performance is significantly dependent on the adjustment of EMD parameters. Sifting stop criterion (SSC) is one of these parameters and performs an important role in improving EMD performance. Therefore, efforts were made to develop an effective SSC method (Huang et al. 1998;Rilling et al. 2003;Qiwei et al. 2007). However, these methods require a predefined threshold to stop sifting, and expertise is required to fix this threshold properly. Such approaches were referred to as hard sifting stopping criteria (H-SSC). To overcome this drawback, an intuitive soft sifting stopping criterion (S-SSC) was proposed to determine the number of shifting (Liu et al. 2017b). Unlike H-SSS, S-SSC does not require any predefined threshold value and is more accurate than H-SSS. Robust empirical mode decomposition (REMD) is a signal mode decomposition method using S-SSC. Thus, REMD can perform signal decomposition more effectively than the traditional EMD method.
Algorithm 1 (Alg1) shows the decomposition steps of the signal X into IMFs plus residual. The REMD begins by assuming X as a residual signal (Alg1 Line 2). Next, this residual signal is checked for monotonicity. If X is not monotonic, then this signal is decomposed. In the decomposition process, first, the local average is determined by the Local Mean Function (LMF) whose steps are given in Alg 2, and the new IMF is extracted arithmetically from the previous IMF. This process continues until the S-SSC decides to terminate the process. In the end, the new IMF is obtained from the previous IMF. These steps are repeated until the signal monotonicity is achieved.
The sifting stopping criterion dictates the number of shifting iterations in the process and EMD decomposition is easily affected by the parameters of the shifting stop criterion. Therefore, EMD performance benefits from the S-SSC that determines the iteration number with an adaptive approach.
The sifting stopping criterion dictates the number of shifting iterations in a sifting process, and EMD decomposition is easily affected by the parameters of the scan stop criterion. Therefore, EMD performance benefits from a soft shifting stop criterion that determines the iteration number with an adaptive approach.
The S-SSC steps used by this study were explained in detail in the previous two studies (Liu et al. 2017a;Peng et al. 2019). Therefore, the steps of the S-SSC are briefly described in this section. Let h ik ðnÞ be the signal of ith IMF after kth shifting and m ik ðnÞ is the mean of h ik n ð Þ where, n = 1, 2, …, N s and N s are the total number of points in h ik . Therefore, S-SSC steps are described briefly, as follows.
4. If f ik ? 1 [ f ik and f ik ? 2 [ f ik ? 1, stop the sifting process and use (k -1)th results. if not, the sifting process continues until the k reaches a predefined maximum iteration value.
where m is the arithmetic mean of m(n).

Naïve-LSTM and SVR (baseline models)
An LSTM (Naïve-LSTM) network is a type of deep neural network consisting of input, output, and hidden layers as seen in Fig. 1. Hidden layers are formed by LSTM units and their connections. The LSTM unit has customized vectors and gates to capture correlations within time series. LSTM overcomes the problem of vanishing/exploding gradients. To overcome this problem, LSTM uses input, output, and forget gates (Han et al. 2004). Thus, the weight of relevant information from previous iterations increases, others are forgotten. These advanced features make the LSTM method a strong candidate for developing a shortterm traffic prediction model. An LSTM network consists of sequentially connected LSTM units, as shown in Fig. 1. The current step (t) LSTM unit accepts the previous step (t) unit information vector (c t-1 ) and the prediction value as input variables. It also processes the data point value in the current step (x t ). The input vectors pass through the sigmoid (r) and tangent hyperbolic (tanh) functions. These vectors create the vectors h t (LSTM prediction for time t) and c t (cell sate for time t) for the next step as a result of the pairwise multiplication and addition operations. This process is applied to every point in the data set. As a result, after each step, the LSTM network becomes a better-trained prediction model.
The LSTM network is briefly introduced in this manuscript, but the LSTM principles can be explored in detail from the original study (Hochreiter and Urgen Schmidhuber 1997). Support vector machines (SVM) is a supervised learning algorithm, and it is developed for classification operations, initially (Vapnik 2013). In the following years, Smola and Schölkopf (2004) demonstrated that this method can also be utilized to develop regression models. SVM regression (SVR) is similar to linear regression. However, linear regression uses all the data in the dataset to develop the model. On the other hand, the SVR model decides whether to use an observation point in the model by considering a certain e distance (Fig. 2).
SVR aims to find a linear function, f(x), given in Fig. 2. For this, the algorithm aims to minimize the coefficient vector (w) of the f(x). This optimization objective function and constraints are given in Eq. (9), where x and y are training data pairs, b is the linear function constant and e is the soft margin value. SVR is introduced briefly, but detailed information can be obtained from many studies on SVR in the literature (Drucker et al. 1997;Smola and Schölkopf 2004;Awad and Khanna 2015).
Data preprocessing details are as discussed in Sect. 2.1. In this first step, the Hampel filter is applied to the traffic flow dataset and the filtered set is divided into training and test sets as seen in Fig. 3. Thanks to data preprocessing, the dataset is cleared of outliers and missing data points are completed by the filter. This process avoids the reduction in model performance by the impact of inappropriate data samples. Then, the dataset is divided into two sets: the training set and the test set.
The REMD process detailed in Sect. 2.2 decomposes the dataset and generates N new data signals, namely the IMF, and the residual signal. This process is applied to the training and test sets separately. The IMF and residual signals generated from the training set are used to train individual LSTM models. As a result, N ? 1 trained LSTM (see Fig. 3, LSTM-1*, LSTM-2*, etc.) models are obtained for traffic flow prediction.
In the final step, the trained models are utilized to predict traffic flow. Let the x(t) be the element of the IMF subsignal set for period t. Therefore, .,x N t ð Þ 2 IMF N. These subsignal samples are given as input to the trained LSTM models, and x 1 t þ 1 ð Þ, x 2 t þ 1 ð Þ; . . .; x N t þ 1 ð Þ values are obtained. These values are signal predictions, and for the prediction of the traffic flow for period t, as seen in Fig. 3, arithmetic addition is performed and the process is terminated.

Prediction errors criteria and evaluation periods
Various error criteria were used to measure the errors of the models developed in this study. Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Mean Squared Error (MSE) are frequently used in similar studies. The equations of these criteria are given in Eqs. (10)-(12).

Fig. 3 R-LSTM working mechanism
Robust-LSTM: a novel approach to short-traffic flow prediction based on signal decomposition 5231 where n is the total number of the test sample, x i is the actual traffic flow, andx i is the predicted flow. Model performances diverge under different flow conditions. To interpret this situation, the test set was divided into three sub-periods, and each sub-period was examined separately (Fig. 4). In addition, the predictive capabilities of the models were calculated and compared for all samples in the test data. These time sub-periods are explained as follows: • Fluctuation period: The traffic flow and fluctuation are highest in this period. The period includes the peak hour volumes occurring in the morning and evening hours. (Fig. 4, red box). • Linear period: In this period, the amount of traffic flow tends to decrease or increase linearly. (Fig. 4, green box).

The traffic flow data
The traffic flow data sets used in this study were obtained from Caltrans Performance Measurement System (PeMS) database. PeMS provides an easy-to-access traffic data resource. The PeMS system processes raw data, and the user has the option of using processed data or raw data. PeMS collects data with the help of numerous detectors on the highway and its connections. Therefore, the desired amount of instant data can be obtained from the field. In this study, traffic station data in District 3 were selected for developing prediction models, and the IDs of these traffic stations are presented in Table 1. The data were collected from the main lanes of highways with continuous flow conditions. These data are presented in PEMS as a series with a time interval of 5 min. The time interval for these series was converted to 15 min for use in this study. The highways where traffic counting stations are located have different lanes. To avoid the complexity of this difference in comparisons, the total traffic flow was divided by the number of lanes. Traffic flow data sets obtained through 25 selected traffic detector stations from different regions were used for the training and testing of R-LSTM and other approaches. Datasets with different statistical properties were preferred to better interpret the performance of the models. Descriptive statistics for the datasets are in Table 1.
The data sets contain traffic flow data covering 7 weeks (22 Apr 2019-09 Jun 2019), and each dataset has 4704 samples (1 sample for each 15 min). Traffic flow oscillation is similar at consecutive weeks. Therefore, the model should be tested for one or more weeks. For training, 4032

Model training parameters
Determination of the strength and effectiveness of the R-LSTM model is important. Therefore, the Naive-LSTM and SVR models were used as baseline models for comparison. Naive-LSTM is a model that does not use REMD and processes the raw dataset. SVR is an approach stated in studies that produced successful results and has been applied to the traffic flow problem.
The proposed R-LSTM and baseline models were developed using Matlab Software. The number of suitable N-LSTM and R-LSTM hidden layer neurons and the maximum number of steps were determined by the trial method as 200 and 250, respectively. Adam optimization was used for N-LSTM and R-LSTM training, and Sequential minimal optimization (SMO) was used for SVR. The training parameters used for these processes are presented in Table 2.

Experimental result
In this section, firstly, the test results of the R-LSTM models developed using different IMF amounts were examined, and the results were discussed. Then, the appropriate R-LSTM model and baseline models were compared considering the evaluation periods.

Determining the appropriate IMF number
Assigning an appropriate number of IMF to the process improves the R-LSTM performance significantly. Therefore, the effect of the IMF number on R-LSTM model prediction errors was analyzed, and appropriate IMF numbers were determined in this section. Figure 5 illustrates the effect of the number of IMF on R-LSTM prediction performances. The red horizontal line within the boxes in the figure shows the error mode. These modes are the lowest at the IMF = 1 for all criteria. In addition, the increment in the number of IMFs increases the average error. The lowest and highest error values can be seen from the lower and whiskers. For MAPE, the lowest error occurred for a dataset when the IMF = 2 and the error is around 2%. However, this value is slightly less than the IMF = 1 MAPE, and it is not a significant difference. The upper whiskers indicate the highest prediction error. MAPE upper whiskers are following an exponential trend with the IMF increase. In other words, the prediction accuracy of the model was significantly reduced for some data sets. There is also an upward trend in RMSE and MSE, but the trend is linear. These different trends are explained by the fact that the MAPE metric is sensitive to errors in predictions for small traffic flows.
MSE is a sensitive measure to large errors. In Fig. 5, MSE upper mustaches have a low error at IMF = 1 than for other IMFs. In addition, the blue box representing 50% of the data sets shows small errors at IMF = 1 than the others. RMSE is the root of MSE and therefore less susceptible to large errors, and gives good insight into average error. The range of the upper and lower whiskers is remarkable for  Kernel offset 0 Alpha 1 Robust-LSTM: a novel approach to short-traffic flow prediction based on signal decomposition 5233 RMSE. The range is the difference between upper and lower mustaches limit, and IMF = 1 range is close to other IMF ranges. This observation shows that the error distributions of IMFs are similar for RMSE. These analyses demonstrate that using one IMF is appropriate for shorttraffic flow datasets used in this study. In this case, two LSTM models should be developed, and the prediction function is as follows (Also check Eq. (9)). Increasing the number of IMFs in the REMD process decomposes the traffic flow data into more sub-signals. The IMF and residual signals produced by this REMD procedure are shown in Fig. 6. Increasing the number of IMFs smooths the residual signals. However, an excessive increase in the number of IMFs over-smooths the residual signal and distorts the signal pattern. This fact can be observed in Fig. 6 for IMF = 3 and 4. As a result, choosing a high number for the IMF number can affect model performance negatively.

Comparison with baseline models
The R-LSTM is compared with selected baseline models in this section. Baseline models were developed using two state-of-the-art approaches, N-LSTM and SVR. First, all data points in the test set were considered for error calculations, i.e., whole test period is considered. Subsequently, the data set was divided into sub-periods described in Sect. 2.3, and the predictions of the models were analyzed. Table 3 shows the mean error criteria of the models examined. According to the RMSE criteria, R-LSTM estimates are approximately two times more accurate than SVR, and approximately three times according to MSE criteria. According to the MAPE criteria, R-LTSM is about 1.5% more accurate. Moreover, the MSE criterion shows that R-LSTM has about two times low error than N-LSTM.
Consequently, R-LSTM is superior to SVR, and utilizing REMD enhances the prediction performance of N-LSTM, remarkably. Figure 7 illustrates the errors of the three models for the datasets. The R-LSTM is more accurate than other models, except for two datasets according to the MAPE, and one according to the other criteria. The error trends for the datasets of the three models are similar, but the MSE proves that the R-LSTM made fewer major errors than other models. For example, the MSE errors of SVR and N-LSTM in Fig. 7 are significantly large for some data sets (e.g., dataset no: 4,7,9). R-LSTM shows remarkable performance for the whole test period. However, a reliable model should also have satisfactory performance over different time periods. Thus, error analysis of models was also performed for sub-periods mentioned in Sect. 2.3.
The fluctuation period is approximately in the middle of the day and includes traffic rush hours. Therefore, it is crucial to enhance the model performance for this period. In Fig. 8, all criteria show that the R-LTSM flow predictions are closer to actual flow than other models. The smallest error of R-LSTM in MAPE is about 2%, for N-LSTM and SVR at about 3.5 and 5, respectively. These results indicate that the R-LSTM's adaptation to traffic flow fluctuation is better than other models. RSME and MSE also point out that this conclusion is acceptable. Another remarkable point is the error range of R-LSTM for MSE. While SVR upper whisker limit approaches 15000, for R-LSTM it is around 4000. This proves that SVR and N-LSTM make relatively high errors, especially in large fluctuations.
The R-LSTM performance is also satisfactory for the linear period but two cases are remarkable. In Fig. 9, MAPE upper whisker of the N-LSTM is about 13% while it is 16% for R-LSTM. In other words, N-LSTM is better at small flow predictions for some datasets than R-LSTM. The ranges of these two models are also close at MSE, but the R-LSTM mode is about 500 lower than the N-LSTM mode. Based on the RMSE criteria, the R-LSTM is significantly better than the others. The SVR approach produces large errors for all criteria. The NLSTM performance is satisfactory for large and small flow values. However, it appears that a slightly better result was achieved with the R-LSTM. This result indicates that N-LSTM can also be effective in this period.
The traffic flow pattern on weekends is different from weekdays. Also, traffic flow may be higher or lower than weekdays depending on the classification of the road. Figure 10 illustrates the model prediction errors for the weekend. It is also clear that the proposed model produced small errors for this period. The MAPE range of R-LSTM  for the weekend is about 5%, and this is lower than the MAPE of the linear period. The difference between R-LSTM and N-LSTM modes is large in this period, similar to the fluctuation period. SVR predictions are also high for this period. These results prove that the R-LSTM adapts better than others to fluctuations in traffic flow.

Conclusion
A robust short-term traffic flow forecasting model can make a significant contribution to traffic management systems. However, a widely accepted approach has not yet been developed, as a significant number of factors influence the short-term value of traffic flow. Separating these factors from traffic flow data will contribute to the increase in model performance.
In this study, a novel approach called R-LSTM is proposed, which decomposes traffic flow using REMD into sub-signals (IMFs) and predicts short-term traffic flow utilizing IMFs and LSTM models. To our knowledge, this is the first study to combine a decomposition algorithm and LSTM for short-term traffic flow prediction. In previous studies, traffic data are usually preprocessed, but these operations are applied only to remove outliers, complete missing data, or smooth the data. In this study, the pattern capture capability of the LSTM is enhanced by the utilization of REMD in addition to the traditional data preprocesses.
R-LSTM performance evaluation was performed with data from 25 traffic counting stations and MSE, RMSE and MAPE criteria. In addition, the R-LSTM performance was compared with the N-LSTM and SVR prediction performances. Moreover, model performances for traffic flow with different characteristics during the week, i.e. sub-periods performances, were also examined. The results show that using traffic flow sub-signals as model inputs significantly in the training process improves LSTM model performance, and R-LSTM is a promising alternative to shortterm traffic flow prediction tasks. Valuable information was obtained by examining the sub-periods. In the fluctuation period, traffic flow oscillates remarkably, and the models have larger errors than the other periods. However, R-LSTM is superior in MSE criteria, especially in the fluctuation period. The adaptation of SVR to traffic oscillation is lower than other models. The models' predictions in linear, and weekend periods are more accurate than the fluctuation period. Consequently, the fluctuation period is the most challenging part for models, and it gives an overall picture of a model's performance.
Despite the success of the proposed model, there are some limitations to the study. Data sets with different statistical properties were selected to train the models. However, more robust analysis results can be obtained by increasing the number of data sets. Another shortcoming is related to hyperparameters. During model training, hyperparameters are fixed to certain values. Therefore, the sensitivity of the models to hyperparameters should be discussed in further study. There remain some issues that should be explored in future research. First, the robustness of the model will be revealed more clearly by completing the mentioned deficiencies. In addition, performing analyses for datasets with missing data, discussing the effectiveness of different decomposition methods together with LSTM, are further studies that will contribute to the advancement of the model. Consequently, the traffic flow signal decomposition enhances the quality of the LSTM model and provides more accurate short-term traffic flow predictions.
Funding No funding was received for conducting this study.
Data availability The datasets generated during and/or analyzed during the current study are available in the Caltrans Performance Measurement System (PeMS) repository, https://pems.dot.ca.gov/ ?dnode=Clearinghouse&type=station_5min&district_id=3&submit= Submit

Declarations
Conflict of interest The authors declare that they have no conflict of interest. Robust-LSTM: a novel approach to short-traffic flow prediction based on signal decomposition 5237