Deep Learning Based Models: Basic LSTM, Bi LSTM, Stacked LSTM, CNN LSTM and Conv LSTM to Forecast Agricultural Commodities Prices

The literature argues that an accurate price prediction of agricultural goods is a quintessence to assure a good functioning of the economy all over the world. Research reveals that studies with application of deep learning in the tasks of agricultural price forecast on short historical agricultural prices data are very scarce and insist on the use of different methods of deep learning to predict and to this reaction of filling the gap, this study employs five versions of LSTM deep learning techniques for the task of five agricultural commodities prices prediction on univariate time series dataset of Rice, Wheat, Gram, Banana, and Groundnut spanning January 2000 to July 2020. The study obtained good forecasting results for all five commodities employing all the five LSTM models. The study validated the results with lower values of error metrics, MAE, MAPE, MSE, and RMSE and two paired t-test with hypothesis and confidence level of 95% as a measure of robustness. The study predicted the one month ahead future price for all the five commodities and compared it with actual prices using said LSTM models and obtained promising results.


Introduction
Countries all over the world are in sight of frequent economic and political unrest due to global food price increase (Shiferaw, 2019). The price of agricultural products, a social signal between demand and supply, fluctuates enormously owing to many factors such as climate, policies, and lop-sidedness between production and information from the marketing side, and thus, has attracted a considerable amount of attention (Weng et al., 2019). Continuous rise in food prices caused by an expeditious increase in demand for food has left more than 800 million people around the world under direct threat due to chronic malnutrition, thus, has drawn big attention for food crop price prediction (Shao & Dai, 2018). Food price increases and its fluctuations have important macro and microeconomic impacts such as adverse effect on the GDP of a country, inflation, household problems, raise in poverty, people's financial life, reduction in nutrition, curtail of consumption of essential services of education, and health care.
Further, the price of agricultural goods affects the agricultural land market, government policies, and entire agro-based industries. The variation in prices of agricultural commodities keeps the farmers under emotional and financial stress as their hard work spent in years becomes waste (Chuluunsaikhan et al., 2020;Sabu & Kumar, 2020).
As food is essential to people's day-to-day life, the accuracy in price prediction as well as knowing the prices in advance are indispensable to properly guide agricultural production, make a correct balance between supply and demand, increase farmer's income, assist the farmers to plan their next crop, and to help the government, farmers, business people in agriculture and consumers to get a clear market awareness, devising business plans, finetuning individuals own finances, and reducing the risks and uncertainties to be handled . Though forecasting of agricultural price is pertinent and considered as paramount for many economic actors (Ly, Traore, & Dia, 2021), as literature confirms research has not reaped the benefits of deep learning-based agricultural price forecasting and remains to be an unsolved problem on date (Sagheer & Kotb, 2019;Paroissien, 2020).
Machine learning and Deep learning approaches like -ANN, SVR, ELM, RF, SVM, , k-NN, HybridNNs-ARIMAX with Exogeneous Variables (Anggraeni et al., 2019), deep-learning techniques like TDNN and LSTM network (Manogna 2020;Sabu & Kumar, 2020). However, studies with the application of deep learning in the tasks of agricultural price forecast are very scarce. It is further argued in the literature (Sabu & Kumar, 2020) that predictive analytics with deep learning techniques makes the intensive analysis even on short historical agricultural price data, and it is expected to solve the problems of all stakeholders. It is felicitous to document that research warrants for the use of new methods from deep learning to forecast (Elsheikh, Yacout, & Ouali, 2019), and to this effect of filling this gap, this study employs five versions of LSTM deep learning techniques for the task of five agricultural commodities prices prediction. So, the primary aim of this research is to apply five versions of LSTM methods on five agricultural commodities prices for the tasks of prediction. The study contributes to the literature as follows: As a novelty, for the first time, this study employs five variants of LSTM, 1. Vanilla LSTM, 2. Bi directional LSTM, 3.
Stacked LSTM, 4. CNN LSTM, and 5. Conv LSTM on five agricultural commodities prices Rice, Wheat, Gram, Banana, and Groundnut. These five agricultural commodities are commonly used, especially Rice is one of the most important staple foods around the world (Maione & Barbosa 2019). With the applications of these five LSTM variants, the present scenario in agriculture is likely to improve and help the farmers in getting some basic knowledge about the best Minimum Support Price (MSP) for their crops. Further, it could act as a nerve centre for both peasants and buyers to delve into several choices and act accordingly (Rakhra et. al., 2021). All the models used have been evaluated by four error metrics viz, MAE, MSE, MAPE, and RMSE and also tested with t-test to enhance the reliability of the models.
All the five deep learning models employed have shown better performance which sheds light into the black box of non-utilization of five LSTM variants in the agricultural prices forecast.
This study has been presented with seven sections, and they are 1.Introduction, 2.

Related studies on agricultural commodity prices time series
This section presents all the relevant methodologies belonging to three main domains, statistical, machine learning, and deep learning, applied in the prediction of agricultural prices. Dairi et al. (2021) state that in this era, many advances have been seen in artificial intelligence (AI), especially in deep learning (DL), an important part of AI. DL extracts relevant characteristics of the data automatically. As the deep learning-driven methods do not depend on feature engineering, it benefits other ML methods. Nassar et al. (2020), while comparing the achievement of deep learning price prediction models with eight statistical as well as bench mark machine learning models, on the time series datasets of Vegetables, Fruits and Flowers, demonstrated that deep learning models, LSTM and CNN-LSTM are efficient in precise prediction of Fresh Produce prices for up to three weeks advance. Sabu and Kumar (2020) used time-series and machine learning models for predicting the monthly prices of arecanut in Indian Kerala state and found that LSTM neural network was good. Weng et al. (2019), while finding the suitability of ARIMA and Deep Learning models on different data sets, daily, weekly, and monthly, identified the deep learning method as the standard agricultural goods prices forecast. In the context of development of effective models, authors Ribeiro, M. H. D. M, & dos Santos Coelho (2019) used RF, GBM, and XGB while adopting SVR, MLP and KNN as baseline models and ranked the models as 1. XGB, 2.GBM, 3. RF, 4.MLP, 5. SVR and 6. KNN and finally concluded that that the ensemble approach was found to be doing good in the investigation of price sequences data.
In another study by Chen et al. (2019), the noise of the cabbage data was reduced using Wavelet Analysis (WA). LSTM model then was applied on the fine-tuned normalized data which was found to be producing better results in achieving accuracy. While providing a concise summary of major deep learning techniques, Zhu et al. (2018) showed that DL methods such as CNN, RNN and GAN, are gaining momentum to help researchers in agriculture price forecast. Rasheed et al. (2021) analysed the wheat prices dataset with LSTM technique. Their study presented that LSTM was performing significantly when compared to other conventional machine learning and statistical time series models. The study also stated that deep learning is fairly a new direction in agriculture.
All the studies cited above and details of related studies found in Table 1 lead to the following few important inferences. 1. There are many studies with various models (statistical, ML, and DL-based) used for prediction tasks of many agricultural commodity prices. 2.
Literature studies clearly indicate that there are very few researches with applications of deep learning models in the tasks of agricultural commodity price forecasting. 3. Further, as far as our knowledge is concerned, no study with applications of five LSTM variants, vanilla or Basic LSTM, Bi directional LSTM, Stacked LSTM, CNN LSTM, and Conv LSTM for agricultural price forecasting has been done before. This study attempts to fill this huge vacuum by applying five LSTM versions.

LSTM Network for modeling time series
Literature has shown that the state-of-the-art DL methods could assure good results as the temporal dependencies and structures of the time series data are learned automatically (Dairi et al., 2021;Shin et al., 2021;Peng et al., 2020 andZeroual et al., 2020). This section devoted to shortly describing the necessary concepts of deep learning models considered, namely, Basic LSTM, Bi directional LSTM, Stacked LSTM, CNN LSTM, and Convolutional LSTM.

Recurrent neural network (RNN)
RNN, aka vanilla RNN, and predecessor to LSTM (Deepa, Alli, & Gokila, 2021), due to its competence to recollect serial information from (Gautam, 2021) historical data, is one of the most famous DL architecture (Gautam, 2021) and also one of the recurrent neural network techniques used for time series forecasts (TSF). As displayed in figure 1, RNN's network lag recursion makes the outcome of the network at time t linked with before time t. The equation defining the function of the single RNN cell is given below.
Tomato LSTM The LSTM is one of the most effective models for dealing with nonlinear patterns in prediction.  Chicken, Chili, Tomatoes LSTM Baseline models: ARIMA, SVR, Prophet, XGBoost Among the five baseline models, the LSTM was forecasted to produce the best results.

Long short-term Memory (LSTM)
LSTM network models are very useful for time-series data (Sezer, Gudelek, & Ozbayoglu, 2020). All its hyperparameters, viz, hidden layers, units in each layer, network weight initialization, momentum values, batch size, gradient clipping, gradient normalization, dropouts, number of epochs, decay rate, learning rate, activation functions, optimization and OG, IG and UG. These four function in a unique way to acquire long-term remembrance, shortterm remembrance, input serial at a given time period resulting in generating new long-term remembrance, new short-term remembrance, and output serial at a given time period. The different function served by every gate is explained as follows: Step 1: The input gate which decides the input information be transferred to the memory cell, is mathematically given as: Step 2: Forget gate ( ), which controls the information to be neglected is mathematically expressed as: Step 3: Updated Gate (̃) controls and updates the output information flowing out of the cell, is given by: Step 4: Output Gate ( ) which updates the previous hidden state is mathematically expressed as: are Weight and bias matrices. "*" is the element-wise multiplication of the vectors, and the sigmoid functions is mathematically given as found below.

Conv LSTM
Convolutional Neural Network, another extensively applied deep learning architecture, has shown its state-of-the-art exploration in time series data too after its colossal accomplishments in the areas of image detection, speech, and facial recognition (Maity & Chatterjee, 2012;Vidal & Kristjanpoller, 2020). Its composition of many layers with distinctive architectures called convolutional layers, and subsampling (pooling) layers (Wang, Mu, & Liu, 2021;Zhang et al., 2021;Vidal & Kristjanpoller, 2020) extracts apt information by understanding the internal representation of the data by convolution operation and pooling operation (Livieris, Pintelas, & Pintelas, 2020;Ji et al., 2020). Its exclusively designed data pre-processing layers generate new feature values by applying convolution operation between the raw input data and convolution kernels, a technique originally used to extract features from image datasets (Livieris, Pintelas, & Pintelas, 2020). Conv LSTM ( Shastri et al. 2020) solves the shortcomings of standard fully connected long short-term memory-FCLSTM, through its 3D tensors characteristics of, 1. Input K1, K2, K3…Kt, 2. Cell output, M1, M2, M3…Mt, and 3.
Hidden states h1, h2, h3…ht, ( Figure 3). The input and past state in Conv LSTM fixes the future state of cell with its convolutional operator (*) in the transmission of state to state and input to state (Arora, Kumar, & Panigrahi, 2020). The key equations 9 to 13 express the Conv LSTM mathematically where "*" and "`" denote convolutional operator and Hadamark product (Shastri et. al 2020).

Bidirectional LSTM (Bi LSTM)
Bi LSTM is a combination of bi-directional RNN and LSTM cell (Graves & Schmidhuber, 2005;Shastri et. al., 2020). Bi LSTM, a version of LSTM, is considered good for addressing time series data (Kulshrestha, Krishnaswamy, & Sharma, 2020). Contrary to LSTM, Figure 4, Bi-LSTM processes the sequences of data in both directions -first to last input and last to first and uses both backward and forward information through its architecture that comprises of forwarding and backward LSTM layers (Kulshrestha, Krishnaswamy, & Sharma, 2020;Mahto et al. 2021). The complete forecasted outputs are obtained by forwarding pass by running all the inputs for time 1 ≤ k ≥ K. For time k = 1 to K, and for time k = K to 1, onward pass and reverse pass for onward states and reverse states are executed. In the same way, reverse pass for onward states for time k = K to 1 and for time k = 1 to K reverse states are performed after finding the objective function derivative used in onward pass for time 1 ≤ k ≥K for reverse pass (Chen et. al., 2019, Kulshrestha, Krishnaswamy, & Sharma, 2020.

Stacked LSTM
Stacked LSTM or Deep LSTM also identified as a multilayer fully connected structure has numerous hidden layers with numerous memory cells. All the numerous LSTM layers are stacked together, resulting in more model complexity and augmented intensity of the model (Shastri et al., 2020;Devaraj et. al., 2021), In stacked LSTM, figure 5, the present layer obtains the value from the preceding layer, because preceding layers' input are learned by the next higher level layers for greater optimized results. Stacked LSTM administers output for every time-step and not the single output for all time steps (Arora, Kumar, & Panigrahi, 2020). The mathematical formulation of model A th LSTM layer for an unrolled stacked LSTM is found in equations.

Data preparation
While dealing with supervised classifiers, time series needs to be formatted into datasets consisting of independent and dependent variables. Hence each commodity in the original dataset is turned into the form of (the independent variable) and (the dependent variable):    

Stacked LSTM
Several hidden LSTM layers can be stacked on top of another to avail of the benefit is that the input values fed to the network go through more than a few LSTM layers and proliferate through time within one LSTM cell. Hence, parameters are well dispersed within several layers resulting in thorough processing of inputs in each time step. An LSTM layer needs a 3D input, and LSTMs, by default, will produce a 2D output from the end of the sequence. The obstacle of producing 2D output from 3D is addressed by having the LSTM output a value for each time step in the input data by setting the _ = argument on the layer. This optional parameter is permitting us to have a three-dimensional output from the disguised LSTM layer as input to the next. In this implementation, stacked LSTM has all the same other parameters as Basic LSTM above. The total CNN model shall be enclosed for the purpose of reusing the same CNN for every sub-serial of the data where the Time Distributed wrapper will administer the whole model once per input. The number of filters is defined as the number of times the input sequence is read, two in this implementation. The kernel size is depending upon the availability number of 'read' operation of the input sequence, which is two again here. The layer of maxpooling follows the convolutional layer filters only the salient features and makes it down to fifty percent in size and finally flatten down to single one-dimensional vector. This vector is to be used as a single input time step by the LSTM at the backend.

Convolutional LSTM
Similar to the CNN LSTM discussed before, the convolutional LSTM removes the CNN in favour of a convolutional reading directly built into each LSTM unit. This model was specifically developed for working with two-dimensional spatial-temporal series, but it has been altered to work with univariate time-series sequences in this implementation. The serial data is modified from its original form to a format similar to the previously defined CNN LSTM model; in addition, the subsequence will need to be converted into two-dimensional vectors. This is achieved by considering them as vectors of shape[2 × 1], i.e., the second dimension is given size 1. The Conv LSTM model is built with a single layer with attributes being the number of filters, similar to CNN LSTM, and a two-dimensional kernel size compared to a single-dimensional kernel in CNN LSTM. The kernel size is defined in terms of [rows, columns]. As mentioned before, due to treating the one-dimensional series as two-dimensional, the number of rows is fixed to 1, and the number of columns will be the size of subsequence in the kernel. After this is done, the output of the Convolutional layer must be flattened and fed to the LSTM model to be used for prediction.  Where is the actual values and ̂ is the predicted values of the models

System Specifications
All experiments were carried out in the Google Colaboratory basic version, with GPU hardware acceleration. The detailed specifications are given in Table 3.

Dataset
The data-set used in this research is obtained from RBI (Reserve Bank of India) website, Government of India. It has five independent time-series data of wholesale agricultural commodity prices from January 2000 to July 2020. The five agricultural commodities time series are Rice, Wheat, Gram, Banana, and Groundnut, which are the most important food crops for civilizations around the world, and the intrinsic complexity of the commodity market, has always been proven that the price forecast is an intractable task (Kamdem, Essomba & Berinyuy, 2020). Each time series dataset has been divided into training (80%), and testing (20%) datasets the 20 percent testing set, which ensures the evaluation of the forecasting models, has a considerable size of unseen ''out-of-sample'' data. (Livieris, Pintelas, & Pintelas, 2020).  Table 4 exhibits the statistical properties of agricultural commodity prices. The skewness data shows positive skewness, and also it is symmetrical as these are between -0.5 0.5 figures except for the products gram and onion. The result on kurtosis clearly presents that the excess kurtosis statistic is purely an indication of the Leptokurtic features of the data.
Shapiro-Wilk (SW) test results (p-value less than 0.01) clearly show that the variables are not following a normal distribution and hence clearly pinpoint (Hansen, McDonald, & Nelson, 1999;Kulshrestha, Krishnaswamy, & Sharma, 2020) that non-parametric methods are deserving application for such a dataset. But stationarity problem remains to be an important issue when statistical modeling is used as a benchmark model.
The loss vs epochs graphs for all the five LSTM versions of all the commodities plotted in figures 8(a-e) to 12(a-e), clearly exhibit a good learning rate, indicating the transformation of arrangement of data into the congruous format. This further confirms that each training sample consists of a series of data points. Thus, the series of data is fed into LSTM layers of all five variants, and the previous time step output is taken to the next subsequent input series.
The focus of the study is to predict the prices of five agricultural commodities using       Where μ 1 and μ2 denote the mean values of actual values of the time series data and the mean values of forecasted of all the models used, the suggestion of t-test is the null hypothesis can be accepted at 0.05 level of significance. The result of the paired t-test has been shown in Table   7. As per the results of statistic testing, the agricultural price forecasting using all the five LSTM variants are acceptable. This further confirms that forecasting agricultural time's series prices of the small dataset with better accuracy using these five LSTM variants looks promising.

Prediction of future price of Rice, Wheat, Gram, Banana, and Groundnut
The predicted values of all the five commodities, Rice, Wheat, Gram, Banana, Groundnut, for August 2020 (one month ahead) and its comparison with reported actual values of these five commodities, are described in Table 8. As found in Table 8 forecasted one month ahead future prices of all five agricultural commodities under consideration and compared the predicted prices with actual prices, and found that results are very promising. Hence this clearly observes that agricultural commodities prices can be forecasted more accurately using these five-deep learning-based LSTM models and provide valid and useful information to peasants, the public, the Government, and traders. In the future, adding more number of agricultural commodities around the world and comparing the results with popular machine learning techniques could provide more insights in the tasks of prediction of agricultural commodities prices.

Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.

Funding details
Not applicable