Deep Learning-Based Hybrid Model for the Behavior Prediction of Surrounding Vehicles Over Long-time Periods

： Autonomous vehicles need to have the ability to predict the future behavior of surrounding vehicles, which helps with proper trajectory planning and tracking. Many behavior prediction methods have limited application because they have a very limited prediction horizon. This paper proposes a deep learning-based hybrid model for behavior prediction over long-time periods, including maneuver recognition and a behavior prediction module. In the previous module, the CNN extracts the social characteristics of the target vehicle and LSTM outputs the maneuver probability vector and forms a contextual feature vector with the social features. In the lateral module, LSTM and Attention are based on the contextual feature vector to capture multi-time step information in the behavior time window to complete the prediction of the target vehicle behavior. Real-car collection and open-source vehicle trajectory datasets were used for training and testing. The results show that the proposed algorithm could predict vehicle behavior with an accuracy of 89.73% and an average prediction time of 2.032 s, which has a high engineering application value.


Introduction
Autonomous vehicles (AVs) have received extensive research interest in recent years because they can enhance road safety, ease road congestion, decrease fuel consumption, and free human drivers [1][2]. Developments in the field will increase in both quality and importance with time [3]. However, how the AV can predict the behavior of nearby road-users is a tricky and considerable issue in the increasingly complex and highly uncertain traffic environment [4]. The approach proposed here could improve the ability of AVs to understand the traffic environment and help with proper trajectory planning [5] and tracking [6].
One part of this problem is to predict the behavior of pedestrians, which is well-studied in computer vision literature [7][8][9]. There are also several review papers on pedestrian behavior prediction [10]. Another part of the problem is the prediction of the other vehicles' future behavior on the road [11]. Both parts are similar but different. For example, vehicle trajectories have certain similarities in the same typical behaviors. Furthermore, roads for vehicles have a clear lane structure and movement direction. In this paper, we focus on vehicle behavior prediction because there are more difficulties due to the higher randomness and mobility of these road-users, especially over long-time periods.
Earlier researchers focused on rule-based methods for vehicle behavior prediction. These methods tended to use Deep Learning-Based Hybrid Model for the Behavior Prediction of Surrounding Vehicles Over Long-time Periods ·3· some rules to judge future vehicle behavior. Houenou et al. [12] predicted behaviors, such as lane keeping ( Crossing (TTC) to predict by rule matching whether the vehicle will depart from the current lane or not [13]. These methods can generate good results in the specific driving environment. However, they require lots of prior knowledge to select model parameters and thresholds, which is difficult to apply for long-range prediction.
Traditional machine learning-based methods are an effective remedy to eliminate the engineers' burden. This is because the behavior prediction model can be learned directly by drivers' behavior labels using supervised learning techniques and no hand-crafted rules are needed.
Mandalia et al. [14] used a support vector machine (SVM) to classify LC behavior, which can reflect the inter-class differences of samples of different behavior categories well; however, this type of method ignores the asymptotic characteristics of time series and cannot reflect the coherence property of a vehicle state sequence. Dynamic Bayesian networks (DBN) are widely used in perceptual prediction problems. Agamennoni et al. [15] proposed a motion Bayesian motion prediction model, but their application in AVs is limited due to the high computational cost. Li et al. [16] separately established hidden Markov models (HMMs) for each behavior while Deng et al. [17] proposed a driving behavior prediction method based on an improved HMM, which uses a genetic algorithm to construct a pre-filter for optimizing the data feature input of the prediction algorithm. However, these methods assume that the state of each time step of the vehicle is only related to the previous time step without using contextual information, and the effect of long-time prediction is not good.
Different from rule-based and traditional machine learning-based methods, deep learning-based methods have received increasing attention in recent years, and they superiority in highly complex and non-linear scenarios.
Recurrent neural network (RNN) architecture has been widely used in the field of time series data analysis.
However, it is difficult to train these networks to learn long sequences in practice due to gradient vanishing or exploding [18]. Long short-term memory (LSTM) model is a variant of RNN by adding a cell structure to the network, which can solve this problem. However, in the previous vehicle behavior LSTM prediction algorithms [19][20][21], the last or the average state value in the hidden layer is usually used as a high-level expression basis for vehicle behavior prediction. The difference in the contribution of features to the prediction results leads to the loss of some information during data transmission of the hidden layer unit. Even though they have good performance over a short-time period, this makes the accuracy of the vehicle behavior prediction algorithm over long-time periods unable to meet ideal requirements.
To make wise decisions, the planning module of AV s needs to reason about long-term future outcomes, which requires predicting future behaviors within three to five seconds [22]. A shortcoming of the aforementioned works is that they do not consider the impact of the vehicle's current maneuver on the future behavior trend, which contributes to low accuracy in long-term prediction. This limits the application of these algorithms because there is inadequate time and space for planning. The current maneuver of the vehicle is important prior knowledge to characterize the vehicle's future behavior over long-time periods. Nachiket et al. [23] proposed a trajectory prediction framework based on the double-layer model that first recognizes the maneuvering modes of vehicles on the highway and then uses them as a basis to predict their future multi-step positions, which significantly reduces the mean square error between the predicted trajectory and the real trajectory.
Inspired by this, the proposed method uses a deep learning-based hybrid model for the behavior prediction of surrounding vehicles over long-time periods. The current maneuver of the vehicle is considered to be a low-level expression of the vehicle's future behavior. The probability vector output from the maneuver recognition module is combined with the vehicle's social features to construct a contextual feature vector. An attention mechanism is also integrated to select key information in the entire vehicle contextual feature sequence, which has performed well in vision and translation tasks [24][25].
The main contributions of this paper can be summarized as follows: (1) The neighborhood traffic information of the target vehicle is considered, the correlation between the traffic subjects is realized, and the social characteristics of the target vehicle are input into the convolutional neural network (CNN) to extract valid information and output the current maneuver probability vector of the target vehicle.
(2) The current maneuver of the predicted vehicle is considered, a prediction algorithm model with LSTM network as the main body is built, and an attention mechanism is introduced to improve the prediction accuracy of the algorithm.
(3) The vehicle natural trajectory data collected from real vehicles and the high D dataset are used to generate the training and test sample sets, and the performance of the proposed method was verified.
The remainder of this paper is organized as follows: Section 2 proposes the deep learning-based hybrid method for behavior prediction over long-time periods. Section 3 introduces the dataset, data processing methods, and experimental details. Section 4 discusses the training results for each prediction task. Section 5 concludes this paper.

Methodology
As is shown in Fig.1, the framework of the proposed method consists of the maneuver recognition module and the behavior prediction module. In the maneuver recognition module, the long-term and short-term memory network outputs the maneuverable probability vector, which forms a context feature vector with social features.

Social Characteristics
For an autonomous driving system or a human driver, the generation mechanism of the vehicle behavior starts from the intention demand and the expected benefit, which must be influenced by interaction with other surrounding vehicles. Additional information is needed to reduce future uncertainty. For the model to better understand this interaction, the resulting probability distribution of the maneuver recognition and behavior prediction will depend on the trajectory history of the target vehicle and the surrounding vehicles. Therefore, the characteristic data of the algorithm includes the information of the identified vehicle itself and its environment. The vehicle social features can be expressed as where is the status information of the target vehicle itself, is the information of its surrounding vehicles, and is road information.  The status information of the target vehicle is defined where , ℎ is respectively the lateral distance between the target vehicle and the left and right lane lines, is the heading angle of the target vehicle, and is the lateral speed of the target vehicle.
In this experiment, we considered the eight vehicles around the vehicle to be the objective vehicle's interaction objects. The vehicle collections concerned are defined as = [ , , , where , , , indicate the front, rear, left-front, left-adjacent, left-rear, right-front, right-adjacent, and right-rear vehicle of the Here, Δ is the lateral relative distance between the position vehicle and target vehicle, Δ is the longitudinal relative distance between the position vehicle and target vehicle, and Δ is the relative longitudinal speed between the position and the target vehicle.
For road information,

Maneuver Recognition Module
The maneuver recognition module focuses on the Convolutional layer 1 and convolutional layer 2 both had 64 convolution kernels of size 3.
To retain more data fluctuation information, the pooling layer selects one-dimensional maximum pooling (MaxPooling1D) as follows: where 3 is the output of the pooling layer, the function represents the pooling operation, and 3 is the offset.

Behavior Prediction Module
The prediction module also targets the future lateral behavior of the vehicle. It should be noted that the lateral behavior here was not the same as the maneuver because they were marked differently. The output category The output of the attention layer at time t is = ∑ =1 .
The attention layer is followed by a fully-connected layer with several units of 20, and the SoftMax function is used to obtain the vehicle behavior prediction probability vector.
where is the weight for the connection, and is the offset. Finally, the maximum probability is taken as the prediction result of the vehicle behavior as follows: ̂= ( ).

Dataset
The vehicle trajectory dataset used in this paper was

Data Preprocessing
The vehicle trajectory data was obtained directly or  The output of the prediction algorithm is the probability value of the three behavior categories, and then the predicted behavior is worth the maximum probability.
Based on the vehicle trajectory dataset, 27,000 sample datasets were extracted. Since the straight-line driving condition occurred much more than the LC condition, the number of LK sequences of the extracted vehicle was much larger than the LC sequence. To prevent overfitting in the training process, the number of sequences selected in each behavior sequence set was the same, where each behavior category was 9,000 and the sample set was randomly divided into a training set and a test set at a ratio of 8:2.

Implementation Details
To reduce the impact of the value span and unit of each raw data, the min-max normalization method was used to convert the raw data to between 0 and 1 as follows: where is the original data, ̃ is the normalized data, and and are the maximum and minimum values of the sample data. To facilitate the training of the model network, it was also necessary to use one-hot heat vector coding for the current maneuver and future behavior labels of the sample set of vehicles.
The maneuver recognition module selected multiple types of cross-entropy as the loss function and the expression is where ̂ is the actual probability that a sample sequence corresponds to the current mobility category of the vehicle; is the recognition probability of the current mobility category of the vehicle corresponding to a sample sequence; and is each batch's sample size of the mobility identification module, which is selected as 100 in the experiment.
The behavior prediction sample was aimed at whether the target vehicle would change lanes in the future. Because of the driver's LC habits and the impact of the vehicle's LC response, some sequence samples that were far from the LC point became difficult classification samples. Therefore, a loss function of focal loss was introduced as follows: where ̂ is the actual probability of the vehicle behavior category corresponding to a sample sequence; is the

Vehicle Maneuver Recognition Test Results
The performance of the maneuver identification module had a significant impact on the predicted quality of the vehicle behavior. To test the performance of the maneuver identification module, if the maneuver category corresponding to the maximum value in the probability vector output by the identification module was the same as the actual tag maneuver, then it was considered to be a successful sample. In this paper, an SVM and an HMM were used as the benchmarks, and the performance of the three models in recall and overall accuracy was analyzed and compared. were above 90%, which was higher than that of the LK maneuver. A small part of the data of some LK maneuver samples had a large jitter, causing the maneuver recognition module to easily misjudge it as a LC maneuver, and thus the recall of the straight-line maneuver category was lower than the other two maneuver categories.

Vehicle Behavior Prediction Test Results
To reflect the performance of the algorithm proposed  behavior. Another concern is that LC behavior needs to be predicted as early as possible to give decision-making and issue because, for many decision-making systems, not only the behavior categories need to be predicted but also the predicted probability distribution of each behavior category.
As shown in Fig.10, the vehicle was about to complete a LC to the right. Compared with the method in this paper, the NULL-LSTMAT method in the figure did not consider the current maneuvering characteristics of the vehicle. In the figure, the abscissa is the time to reach the LC point (Time To Lane Change (TTLC)), and the ordinate is the probability distribution predicted by each behavior category.
As shown, although there was not much difference between the two in the long period before the LC point, the proposed method tilted to the truth value category in the behavior prediction probability distribution about 3 s before the LC.
The current maneuvering characteristics of the target vehicle enriched the prior knowledge of the behavior prediction algorithm, which increased the probability value of LCR and decreased the probability value of LCL. This also proves that our method has a better prediction time for LCs.

Conclusion
This