Onions are grown in both rabi and Kharif seasons having both seasonal as well annual effects on the daily prices movements and volatility. Hence the study is conducted taking daily per quintal onion prices of 11 years from the period of January 2010 till March 2021 from the AGMARKNET website’s database. Seven major wholesale markets were selected depending on the daily volume arrival levels as compared to other wholesale markets throughout the country, with the purpose of better elucidating the precarious nature of Onion prices. The state of Maharashtra is the largest producer of Onions in India, and as a result, both Lasalgaon and Pimplegaon acts as important producer markets catering as an important source for onion supplies. On the other hand, Azadpur is one of the largest consumer markets covering major parts of Northern India. To have a deeper perspective, other wholesale markets such as Ahmedabad, Bengaluru, Kolkata and Ludhiana have been selected. These markets are located in states with high consumption as well as a higher rate of production of vegetables such as Onions, Potatoes and Tomatoes. Based upon the onion prices, daily returns have been calculated, providing normalized values which in turn help in estimating the variation in changes of prices also known as volatility. The precision of volatility prediction of various asset class prices has been extensively studied before and are already many studies undergoing further improvement in accuracy measures. Usually, in the capital markets, volatility is often estimated using GARCH(Generalized Autoregressive Conditional Heteroskedasticity) methods and its different variations. GARCH model is an extended version ARCH(autoregressive conditional heteroskedasticity) model. The difference between the two is that the ARCH model uses previous periods error terms to describe the variance of the current time series error term. On the other hand, a GARCH(1,1) method also takes into consideration previous period variance terms as well. GARCH was initially proposed by Bollerslev, T. (1987), and is considered to be one of the most vigorous methods for modelling change in variances in a time series. Apart from that, there were many extensions, which have come up recently such as FIAPGARCH, APGARCH, EGARCH, SEARCH (Cao et. al(2009), Ding et. al(2007), Nelson & Cao (1992), Canarella & Pollard(2007)] and more. These methods tend to have identical accuracy levels as there is an intricate relationship among variables depending upon the problem and size of the data sets(Bildirici, M., & Ersin, Ö. Ö. (2009), Nelson, D. B. (1991), Glosten et al. (1993)).

The latest improvements in deep learning methodology and its application in forecasting methods are grounded on ANN(Artificial Neural Networks) models. Kamruzzaman& Sarker (2004) and Wang et. al. (2006) have noticed in their works on neural networks, ANNs imitate the human brain in acquiring and organizing knowledge processing. One of the many benefits of ANNs is that there is no need to check for the functionality among variables of the network(Kristjanpoller W. & Minutolo M. C. (2015), Liu W. K. & So M. K. (2020)). This makes them fit for out of order input variables to be added to the given model. Although according to Benidis K et. al(2020), ANNs are having high accuracy levels, their extensions such as RNN(Recurrent Neural Network) have extraordinary accuracy prediction rates. RNNs are known to process sequential data by keeping it into a sort of memory cells and self-learn by various iterations. It has been used in applications such as speech recognition, text to speech conversations and time-series predictions. There are many examples where RNNs and GARCH type models are used for predictions for various asset classes in the finance industry. RNNs have supposed to show better predictive accuracy levels as compared to their ARCH model counterparts as shown by Petropoulos et. al(2017), Henriquez and Kristjanpoller(2019) and Gers et. al(2000). There are also many hybrid models proposed for various commodities such as (Bildirici and Ersin(2009), Kristjanpoller and Minutolo(2016), Lu et. al(2016)) which have been shown to have advantages over simple RNNs or time series models. Special forms of RNN such as the LSTM(long short term memory) and Stacked LSTM have lately been shown to achieve enhanced results on time series data. Hochreiter and Schmidhuber (1997) was first to conceptualize the LSTM network. Since then, it has shown successful outcomes in various time series forecasting goals. LSTMs are known to remember patterns of data for a longer duration of time and hence are ideal for forecasting large sequential data. Stacked LSTM encompasses multiple LSTM layers, and was first formulated by Graves (2013) and its first widely known application has been on speech recognition problems. In a typical stacked LSTM, the initial LSTM layer produces series vectors which are then added as an input to the succeeding LSTM layer, simultaneously it receives feedback from its earlier time step and hence helps in capturing deep-rooted past data. and thereby learning data patterns in the specific time series.

These recent developments in AI and machine learning models open up the potential for a better understanding of predictive accuracy of asset prices and volatility, helping all the stakeholders in managing risks for their financial investments. In this paper, we intend to propose a hybrid model using the traditional GARCH method to calculate the conditional volatility and based on the Stacked LSTMs forecast the values for the next 10,15 and 20 days. We compare this model with only using the traditional GARCH, other AI/ML prominent models such as ARIMA and pure vanilla LSTMs with varied input units. We hope that the given model would indoctrinate the variance fluctuations as observed by the GARCH models along with the considerations of the nonlinear relationships among the variables through RNN based LSTMs, thereby providing an enhanced forecast of volatility values.

__Volatility Calculation:__ Volatility is an important criterion for understanding the variation in prices of an asset whether a stock or a commodity. One of the suitable terms for defining volatility is the variance of the returns of the underlying asset prices. The general formula for calculating volatility is given by the following equation:

Where *V*t from equation 1 is the realized volatility at day t during T trading days**, ***TR* from equation 2 are the True Returns of the asset price**,*** R*i is the return of onion prices on the day *I, Ã *average return of onion prices during* T * trading days*. *In this study, three scenarios would be considered i.e T= 10,15 and 20 days.

__Measures of Prediction errors:__

To compare the performances of the models, MAPE(Mean Absolute Percentage Error) is used. MAPE is extensively applied in the financial engineering literature in studying models for various commodities and foreign exchange markets. Many authors have used other measures for predicting the accuracy levels used for forecasting commitments. Authors such as Bentes(2015) and Kristjanpoller and Hernández(2017) have comprehensively used four measures of prediction errors (i.e., MSE, MAE, RMSE, and MAPE) in volatility forecasting in various commodity markets. Besides, these four measures have also been adopted in studying prediction models in the foreign exchange market such as Sempinis et al.(2012), Petropoulos et. al(2017) and Henriquez and Kristjanpoller(2019). To be in line with the literature, all are utilized in this study to compare different volatility prediction models of copper price.

__GARCH(1,1):__ Traditional methods based upon OLS(Ordinary least square) are not suitable for financial time series data since the variance is not heteroskedastic in nature. To counter this, Engle(1982) introduced the ARCH(autoregressive conditional heteroskedasticity) model in which error terms are autoregressive i.e. the current error term is dependent on previous period error terms. Bollerslev (1986) moves ahead with the approach and introduces the GARCH(generalized autoregressive conditional heteroskedasticity) model taking into account previous conditional variances and previous error terms for computing the current conditional variance. On the other hand, GARCH also involves volatility clustering present in a usual time series. GARCH model could be presented by the following equation:

*where*** ***α0 > 0, αi ≥ 0, i = 1, . . . , p, βj ≥ 0, j = 1***, . . . , q*** which guarantee that the conditional variance of GARCH(p, q) is always positive*

In this study, the volatility is calculated by parameters estimated by storing return values of daily maximum prices to the GARCH(1,1) model as inputs. The order of error terms and conditional variance is taken as 1, based on the descriptive analysis of various parameters when fitting the return on daily max price data. From figure 3 it is clearly visible that Conditional volatility tends to cluster, showing the heteroscedastic property of the time series for Lasalgaon, Azadpur and Pimplegaon data. Also, the GARCH method is best suited for high-frequency time series data, and hence it seems a suitable approach to opt for daily returns of max price data of onions

__ARIMA: __As the name suggests Auto-Regressive Integrated Moving Average are a class of ML models used for time series forecasting showing linear properties based upon Box Jenkins methodology(Box et a(2015)). Also, ARIMA can perform exponential smoothing of time series, corroborating in better data representation and avoiding overfitting, helping in better accuracy levels. ARIMA tends to show a flexible approach for forecasting as they could handle various types of time series. However, there are serious drawbacks with these models as they are weak in interpreting non-linear time series as they assume a linear relationship among variables and hence are not suitable for complex problems (Zhang, G. P. (2003)). And due to this property, ARIMA models are usually preferred for short term forecasting (Shukla and Jharkharia(2013)). When compared to various ANN models, ARIMA has shown to have only a slight improvement in performance based upon the nature of forecasting. According to Darbellay & Slama, 2000 ARIMA models are shown to have better accuracy levels than other ANNs for small forecasting windows. On the other hand, Zhang, G. P. (2003) proposes that a hybrid of ANNs and ARIMA have improved predictive powers than when the models are used standalone. When compared to the recent AI methods such as LSTM s, ARIMA tends to show inferior performance(Zhang et al(2021), Wang et al(2021).In this scenario, we intend to use stacked LSTMs with both Multivariate and Univariate for predictions and compare them with predictions of ARIMA models.

__LSTM:__ As mentioned above LSTM(Long Short Term Memory) is a standard RNN model dealing with gradient descent problems. It comprises recurrent gates known commonly by input, output and forget gates. LSTM can adapt to deep learning tasks demanding long-time memory of events. It also enables the reduction of the signals that have both low and high-frequency components.

The compact forms of the equations for the forward pass of an LSTM unit with a forget gate are as follows :

*f*t = 𝝋 (Wfxxt + Wfhht−1* + b*f ) (7)* *

*i*t = 𝝋 (Wixxt + Wihht−1* + b*i) (8)

*o*t = 𝝋 (Woxxt + Wohht−1* + b*o) (9)

*z*t = ft* ⊙ z*t−1* + i*t* ⊙ tanh(W*cx*x*t + Wchht−1* + b*c ) (10)* *

*h*t = ot* ⊙ tanh(z*t *) (11)** *

**where ***z*t is the memory cell

*i*t **is input gate**

*f*t **is forget gate**

*o*t **is output gate**

*⊙ ***is an operator denoting element-wise product**

** ***x*t **represents an input vector**

*h*t **is*** * hidden state or output vector

**()** is the sigmoid function and tanh(.) is the hyperbolic tangent function

Onion Sales price data is taken from the period from 1st January 2010 to March 31st 2021 for various tier 1(Azadpur, Bengaluru, Kolkata) and tier 2(Lasalgaon, Pimplegaon, Ahmedabad, Ludhiana) wholesale vegetable markets.

Based upon daily Onion max price data returns “True returns' ' are calculated and then daily corresponding Conditional Volatilities using the GARCH(1,1) method as shown in figure 3 for the Indian wholesale markets of Azadpur, Lasalgaon and Pimplegaon. As could be seen in the chart the conditional volatility is representative of change variations of the max price returns. Also through Python-based libraries specifically designed for the GARCH model, volatilities for the next 10,15 and 20 days are forecasted. Again the GARCH based conditional volatilities are forecasted for the next 10, 15, and 20 days using the Deep Learning-based LSTM(Long Short Term memory) with different configurations first with taking only the Daily Max Price data with Conditional volatilities into account and then taking Max Prices in the first scenario and Min, Max and Modal Prices in the second scenario along with Conditional volatilities as input. The forecasted values are checked for predictive accuracy and are then compared with each other i.e initially the GARCH(1,1) forecasting accuracy and then for the LSTM methods with various parameters.

The data set was divided into training and testing sets. The training set consisted of 90% of values and the testing sets consisted of the rest i.e. 10%of the values.

For LSTM based models, the values are normalized after separating them for training and testing set. The process of normalization simplifies the training process and makes it more robust(Shavit, G. (2000), Yin et al.(2017)). The process of normalization has also been effective in classification based problems as shown by (Jayalakshmi and Santhakumaran(2011)) and it is widely used in neural network-based deep learning methodologies as it helps in stabilization of the overall process and reducing the training epochs required to train them. To implement the LSTM models, TensorFlow libraries are used as it is the most convenient method to apply neural network-based deep learning models as demonstrated by (Heaton et al.(2016) and Chen et al. (2020)).TensorFlow enhances productivity and automatically helps in optimization.

Their configuration is used with different shapes and the names of the layers. Configuration used consisted of one input LSTM layer with 150 units or neurons, followed by another LSTM layer of 200 units or neurons and two hidden layers of 150 and 70 units respectively.

And finally, one output layer comprising forecasted results. Similarly, the complexity of the architecture is boosted by increasing the number of layers as well as no of neurons. To make certain that the model always returns the same results, the random seed in neural networks architectures is set to be zero. Also, during training, instances are divided into batches for optimization. In the given scenario, a size of 500 values is taken for a single instance which were dependent on the minibatch method formulated by Goodfellow et al.(2016).On the other hand, to maintain a consistent gradient, there needs to be a small learning rate. Adam(Kingma et al.) an adaptive learning rate algorithm is used for training as it helps in deciding an appropriate rate due to it having the property of bias correction.