Prediction of Agricultural Commodities Futures Prices: A DQN-LSTM Method


 This paper combines deep Q network (DQN) with long and short-term memory (LSTM) and proposes a novel hybrid deep learning method called DQN-LSTM framework. The proposed method aims to address the prediction of five Chinese agricultural commodities futures prices over different time duration. The DQN-LSTM applies the strategy enhancement of deep reinforcement learning to the structural parameter optimization of deep recurrent networks, and achieves the organic integration of two types of deep learning algorithms. The new framework has the capacity of self-optimization and learning of parameters, thus improving the performance of prediction by its own iteration, which shows great prospects for future application in financial prediction and other directions. The performance of the proposed method is evaluated by comparing the effectiveness of the DQN-LSTM method with that of traditional predicting methods such as auto-regressive integrated moving average (ARIMA), support vector machine (SVR) and LSTM. The results show that the DQN-LSTM method can effectively optimize the traditional LSTM structural parameters through policy iteration of the deep reinforcement learning algorithm, which contributes to a better long and short-term prediction accuracy. In particular, the longer the prediction period, the more obvious the advantage of prediction accuracy of a DQN-LSTM method.


27
As an important performance indicator of agricultural market price, agricultural commodities futures prices not 28 only provide people engaged in the agricultural production and operation with more accurate information on long-term 29 price fluctuations, but also serves as an important basis for hedging decisions for those involved in the agricultural explanations, it is difficult to accurately predict the complex and changeable agricultural commodities futures prices 50 because the nature of the data patterns have linear features. 51 With the flourishing of machine learning, the application of artificial intelligence is gradually extended. Compared 52 with econometric methods, machine learning methods are widely used in financial time series prediction as they can 53 mine valuable information directly from data without pre-formulated assumptions and machine learning can better 54 handle non-linear data. [13] and [14] used Convolutional Neural Network (CNN) and Back Propagation Neural Net-55 work (BPNN) respectively to predict ETF prices. [15] used the BAT algorithm to predict copper price fluctuations. 56 The experimental results showed that the BAT algorithm outperformed the classical prediction method. [16] and [17] 57 used Decision Tree Algorithm and Support Vector Machine(SVM) to predict price of copper futures in London Metal 58 Exchange, respectively. Although traditional neural networks have good prediction capability, their accuracy is still 59 not satisfactory when faced with dynamic and non-linear time series data. 60 In recent years, deep recurrent neural network such as LSTM has presented prominent performance in time se-61 ries prediction. Unlike traditional RNNs, LSTM introduce memory units where the memory of previous inputs can 62 be persistently stored in the internal states of the network, thus allowing LSTM network to explore serial data such 63 as time series. LSTM network with a better prediction a bility can also explore the abstract features inherent in the 64 data, grasp the hidden structure constants in the data, and then process the time series. widely used in industrial manufacturing [23], path planning[24], and gaming [25], which can solve the challenge of 78 sequential decision making in the selection of Dropout parameters to some extent [26]. DQN is the earliest one using 79 deep neural networks among the many deep reinforcement learning algorithms, which has the advantages of simple 80 structure, relatively low hyperparameter proficiency and easy implementation, and has been widely used in multiple 81 fields.

82
In response to low prediction accuracy caused by overfitting in the process of LSTM algorithm time series pre-83 diction and the randomness faced by the selection of Dropout parameters to solve the overfitting problem, this paper 84 proposes a hybrid prediction method (called DQN-LSTM) combining DQN and LSTM by intelligently deciding the In the Eq.1, the time differential target value of the DQN is given, the reward is obtained from the current step, 123 and is the discount factor. ′ is the state at the next time step and the action taken. is the target Q-network where is the parameter of the evaluation network, is the learning rate, and  is the batch size data sampled 127 from an experience base consisting of all experiences.

128
As for experience replay, the data for DQN training is randomly drawn from an experience memory pool [27]. This 129 pool stores the results of the operation system, as the empirical data is highly continuous and correlated, which affects 130 the training convergence of DQN because it does not satisfy the requirement of neural network parameter training 131 regarding the independent identical distribution of the training data, and the experience replay method can reduce 132 the correlation of the data samples. In order to make use of high-quality experience more effectively, [28] proposed 133 a priority experience replay mechanism, which firstly assigns a certain priority to all experiences according to their 134 performance, then ranks the experiences according to their priority, and prioritizes the experiences with higher priority 135 for neural network training when performing experience replay. valid historical information in the long term due to the influence of continuous data input. It is able to deal with the 140 long-term dependence of time-series data by introducing a gating mechanism to replace the nodes in RNNs.

141
The basic structure of a LSTM network is made up of a series of recurrently connected sub-networks (i.e. memory 142 modules), each with an internal configuration (as shown in Fig.2). A memory module is basically a memory store, Then the forget gate determines which information should be cleared from the unit state.
The old unit state −1 is updated to the new unit state by discarding some of the information from the old unit 151 and adding filter candidates.
Finally, the output gate filters the unit state and calculates the required information, with the final output shown as 153 following: where is the unit state,̂ is the new state after the update, ℎ is the hidden layer state, i.e. the activation of the

12:
Step4: Calculate the loss and obtain the reward ;

205
It is essential to apply a variety of performance metrics when evaluating the predictive capability of the method 206 developed. In this paper, two evaluation criteria are used: horizontal and directional prediction. In order to improve 207 the horizontal prediction accuracy, , and evaluation metrics are used, among which, is 208 the mean absolute error, the mean absolute percentage error and the root mean square error. The 209 specific calculation formulas are demonstrated through Equ.10 to Equ.12.
where is the number of predicted data, and̂ are the actual and predicted values. Generally, the smaller the 211 values of the evaluation metrics of , and , the closer the predicted value of the method is to 212 the true value, which means that the method has a higher prediction accuracy. From an economic point of view, the 213 ability to predict the correct direction is even more important than the accuracy of the directional predictions. It can  The LSTM method only slightly outperforms the DQN-LSTM method in terms of value for soybean meal.  In terms of directional prediction accuracy, the value of the DQN-LSTM method are all higher than those of by the LSTM method and ARIMA method, and the SVR method with the lowest directional prediction accuracy. The

260
DQN-LSTM method has a value of 92.372%, which is 7.974% higher than that of the ARIMA method, 24.498% 261 higher than that of the SVR method, and 2.102 higher than that of the LSTM method. This pattern also applies in   In addition, Fig.5

315
The changes in agricultural commodities futures prices have a bearing on agriculture and even the national econ-  In view of the highly adjustable nature of the neural network, this paper can be improved in various directions, such 331 as setting the network depth, the number of hidden units and the learning rate in LSTM as parameters to be decided 332 by reinforcement learning, so that the automatic artificial intelligence design of the LSTM network can be realized.

333
A variety of non-homogeneous information can be added as input to the neural network, with data such as wavelet

347
We declare that we have no financial and personal relationships with other people or organizations that can inap-348 propriately influence our work, there is no professional or other personal interest of any nature or kind in any product, 349 service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript 350 entitled, "Prediction of agricultural commodities futures prices: A DQN-LSTM method".