During the last decades, lots of machine learning tools and techniques were applied to optimize the predictive capability of the stock price. This section presents recent and significant analyses and results, particularly for the period of 2015–2020.
Some machine learning techniques applied to characterize the best-adequate model in terms of stock price prediction as Extreme Learning Machine (ELM), Backpropagation Neural Network (BNN), Radial Basis Neural Network (RBNN), and Deep Learning (DL) (Chen et al. 2018). All the three size datasets (small, medium, and large) of CSI 300 Index Future associated with Shanghai and Shenzhen Stock Exchange were used for the analysis. The results confers that the Deep Learning (DL) is comparatively better performed than the rest of the models. The paper also suggests that the performance metrics of the model increase as sample size increases. This means a larger dataset may deeply characterize the intrinsic nature of the stock price.
Another comparative study was proposed to use Deep Neural Network (DNN), Long Short-Term Network (LSTN), Logistic Network model (LNM), and Random Forest (Fischer & Krauss 2018). Thomson Reuters (listed in S&P 500 index) of a very large size dataset (25 years and 9 months) was gathered between the periods of December-1989 to September-2015. The study concluded as the performance metrics of LSTN are better compared to the rest of the predictive models.
A mixture model proposed by Guo et.al (2018) uses three types of the dataset as Daily i.e. listing date to 31-March-2017. The 30-minute dataset was collected between 1-January-2017 to 28-February-2017, and the minutes' dataset collected between 1-February-2017 to 28-February-2017 of SH600006, SH6000016, SH6000036, SH6000056 which are listed in Shanghai Stock Exchange. Adaptive Support Vector Regression (ASVR), which is an ensemble of Particle Swarm Optimization (PSO) and traditional Support Vector Regression (SVR), was applied to these datasets. The performance metrics confer that ASVR has slightly better predictive capability w.r.t. Back Propagation Neural Network (BPNN) and Support Vector Regression (SVR) in terms of MAD, MAPE, and RMSE respectively.
An automated model Xuanwu developed by (Zhang et al. 2018) to forecast the futuristic trends of the stock index. The dataset was collected between the period of 25-January-2010 to 1- October-2016 (approx. daily stock price of 6 years and 9 months dataset) of 495 stocks listed in Shenzhen Growth Enterprise Index. The proposed model implements the Random Forest model using WEKA. The Random Forest model performs slightly better than ANN, SVM, and kNN on the bench of performance metrics Prediction Duration (PD) and Return of the trade (ROT) respectively.
Tree-based models were first time applied to predict the stock price actuation using a high-frequency dataset (Basak et al. 2019). The dataset of Facebook and Apple (size 10 kB to 700 kB with 1180 to 10700 samples) was collected between the period beginning to 3-February-2017. The two esteemed models XGBoost and Random Forest (RF) were applied to the high-frequency dataset and found that XGBoost model provides an accuracy of 78% which is better than the other model. The experimental result also reports that the performance of the tree-based model can be improvised by implementing an ensemble model.
A Multi Filter Neural Network (MFNN) was used on the high-frequency dataset of CSI 300 stock price index. Approx. 3 years dataset between the period of 24-December-2013 to 7-December-2016 was collected (Long et al. 2019). The MFNN model was applied to a 30-set dataset of CSI stock prices. The model performed 6.28% better than the comparatively performed model CNN and RNN. The performance metrics of the MFNN were also compared with Linear Regression, Logistic Regression, LSTN, SVM, and Random Forest respectively. The experimental result concludes that the MFNN model provides far better predictive capability in terms of the Rate of Average Access Return, Total Return, Return Rate, etc. than the rest of the model.
A model that is a combination of Genetic Algorithm (GA) and Convolutional Neural Network (CNN) was proposed and was applied to seventeen years dataset of the daily KOSPI stock collected between 4-January-2000 to 31-December-2016 (Chung et al. 2020). The experimental result suggests the CNN performed with 70.16% accuracy without the Genetic Algorithm. With the combination of the GA, it provides an accuracy of 73.74%. The study also suggests that deep learning may improvise the result.
A modified Convolutional Neural Network model was applied to the daily dataset of S&P 500, NASDAQ, NYSE, and RUSSELL 2000 indices collected between the period of Jan-2010 to Jan-2017 on the interval of a single day (Hoseinzade & Haratizadeh 2019). The modified CNN model as “CNNpred” was applied and found a 3–11% improvement in terms of F-measure. The study suggests that modification in the “CNNpred” can be improvised in the future to get more results that are precise.
A comparative study using machine-learning models to forecast the stock price index (time series) was exploited in (Ersan et al. 2019). Three result-oriented machine-learning frameworks as k-NN, ANN, and SVM applied to the ten years daily and hourly datasets of DAX 30 and S&P 500 respectively. The experimental results were evaluated in terms of SS, DS, and RMSE and found: (i) Hourly data has better prediction capability than daily dataset upto a certain extent. (ii) k-NN performance is better in terms of RMSE (minimum, average, and maximum) in terms of both datasets. (iii) SVM provides stable results whereas ANN and k-NN outperformed the SVM in terms of RMSE.
(Shah & Isah 2019) did a comprehensive study to review the taxonomy of prediction techniques in the domain of stock price prediction. Lots of machine learning tools and techniques were analyzed and concluded as (i) Longer-term dataset can contain less noise and more prediction capability (ii) Mixture model (mixture of machine learning and statistical model) has better prediction capability to predict the stock price.
A systematic analysis and review to predict the stock price index considering frameworks of 50 research papers published between the period of 2010 to 2018 as Fuzzy based techniques like ANN, NN, SVM, SVR, HMM, and K-means by (Gandhmal & Kumar 2019). This paper concluded as. (i) This study reported the research gap and challenge of the different clustering and classification techniques as Bayesian model, Fuzzy classifier, ANN, SVM, Decision support system, machine learning, CNN, etc. and finds ANN is the most applied method in terms of stock price prediction (ii) It also explores the different dataset applied in the period and analyzed its performance metrics in terms of MAPE, RMSE, Accuracy, Sensitivity, and Specificity, MAE, etc. (iii) The studies concluded that stock price prediction is a very complicated task, so despite historical dataset analysis, some other factors may also be considered for the more precise prediction result.
A mixture model, which is a combination of Ensemble Adaptive Neuro-Fuzzy Inference System (EANFIS) and Support Vector Regression was developed (Zhang et al. 2018) and applied to four securities 002570, 600422, 000049, and 002375 datasets listed in Shanghai and Shenzhen Stock Exchanges. The dataset was collected between the periods of Jan-2012 to Jan-2017. The model analyzed these nearly five years daily and the historical dataset. The experimental results conclude that the mixture model performs better than the single-stage ENANFIS as well as the two-stage model (SVR-SVR, SVR-Linear, and SVR-ANN) based on performance metrics as RMSE, MSE, MAE, and MAPE.
A synchronized study performed by (Jiang 2020) to use a deep learning model in the paradigm of stock price forecast. All recent studies were reviewed especially the last three years' research in the field of stock price forecast. About 100 research papers with their datasets, tools, and techniques were analyzed deeply. The study found that deep learning is a suitable tool and has many scopes to forecast the stock price index trends in the near future.
A detailed study was performed to explore and review the 96 research papers published in SCI index, 2016 by (Reschenhofer et al. 2020). This paper concluded with surprising results as (i) the study disappoints due to the rare use and non-availability of the high-frequency dataset. (ii) It is also commented on the benchmark and suitability of the dataset. (iii) The study concluded as, despite the hype about financial big data and sophisticated machine learning frameworks, there are rarely any relevant and truly experimental studies found.
Despite huge research of more than fifty years in this paradigm, no researchers were able to conclude a single manifest or well-performed model that can provide optimal predictive results. Table 1 summarizes the brief of the proposed models in terms of the dataset used, target output, number of samples, sampling period, method, and their performance measure. This paper proposes the most popular machine-learning model as the trio-ensemble of deep learning models to optimize the prediction results.
Table 1
Stock Price Literature Review – Summary
Reference | Source of data | Targeted output | Frequency of Samples | Period of sampling | Applied method | Performance metrics |
Chen et al. 2018 | CSI 300 Index Future associated with Shanghai and Shenzhen Stock Exchange | Opening price Forecast | small, medium, and large size high-frequency dataset | 20-February-2017 to 20-April-2017 | Neural Network, Deep Learning, Extreme Learning Machine, Backpropagation, and Radial Basis Neural Network | Directional Predictive Accuracy (DA) |
Fischeret et al. 2018 | Thomson Reuters listed in S&P 500 index | Stock Price | Daily dataset of 25 years 9 months | December-1989 to Sepetember-2015 | LSTN, Logistic Regression Deep Learning, Random Forest | Probability of LSTN, RMSE, MAPE, DM, PT |
Guo et al. 2018 | Shanghai Stock Exchange benchmark datasets (SH600006, SH6000016, SH6000036, SH6000056) | Stock Price | 5-minutes, 30-minutes, and daily dataset (3 type dataset) | 5-minutes: 1-February-2017 to 28-February-2017, 30 minute 1-January-2017 to 28-February-2017 Daily: Listing to 31-March-2017, | SVR, BPNN, Adaptive SVR | RMSE, MAPE, MAD |
Zhang et al. 2018 | Shenzhen Growth Enterprise Index (495 listed stocks) | Close Index Forecast | 6 years and 9 months Daily Stock Price | 25-January-2010 to 1- October-2016 | Xuanwu | Return of the trade, Forecast Duration (PD) |
Basak et al. 2018 | Apple and Facebook stock Price | Stock Price Return | 10kb-700kB in size | Date of listing to 3-February 2017 | Random Forest, XGBoost | Specificity, F-Score, Brier, AUC, Accuracy, Recall, Precision, |
Long et al. 2019 | CSI 300 | Stock Return | 3 years (approx.) High Frequency (1-minutes) | 24-December-2013 to 7-December-2016 | MFNN = DNN + (2D2) feature extraction | Rate of Average Access Return, Total Return, Return Rate, etc |
Chung et al. 2019 | Stocks of KOSPI, Bloomberg | Stock Return | 17 years (daily stock price) | 04-January-2000 to 31-December-2016 | CNN + GA | Comparative Accuracy |
Hoseinzade et al. 2019 | S&P 500, NASDAQ, NYSE, and RUSSELL 2000 indices | Close Price Forecast | Daily samples | January-2010 to January-2017 | CNN | Macro-Averaged-F-Measure |
Ersan et al. 2019 | Stocks of DAX 30 and S&P 500 | Stock Return | 10 years of daily and hourly data | 02-January-2004 08:00 GMT 06-March-2015 20:00 | k-NN, ANN, and SVM | SS, DS, and RMSE |
Shah et al. 2019 | NYSE, S&P 500, etc. | Price Return and others | Daily, Weekly | Different periods for the different papers | Random Forest, XGBoost, SVM, ANN, etc. | Test Error, Average profit, Precision, Recall, and F-score |
Gandhmal, & Kumar 2019 | Goldman Sachs Software, Microsoft Corp., S&P 500, BSE, DJIA etc. | Price Return and others | Daily, Weekly and others | Reviewed 50 research papers between the period of 2010–2018 | ANN, SVM, Decision support system, CNN, etc | MAPE, RMSE, Accuracy, Sensitivity, and Specificity, MAE, etc. |
Zhang et al. 2020 | Four securities code (002570, 600422, 000049 and 002375) from Shanghai and Shenzhen Stock Exchanges | Stock return | 5 years of the daily dataset | January-2012 to January-2017 | SVR-ENANFIS | MSE, RMSE, MAE, MAPE |
Jiang et al. 2020 | Datasets of 100 research paper | Stock return | All available samples | 2017–2019 | Deep Learning model | F1 score, precision, recall, MCC. RMSE, MAPE MAE, and MSE. |
Reschenhofer et al. 2020 | 58 dataset of financial time series, Yahoo Finance | Stock price return and others | 1-day, 1-week, 1-month, 1-year | 96 publications of 2016 | Tools revied papers analyzed as SVM, ANN, etc | Different metrics for the different papers |
Proposed Work | SPY Stock Exchange Traded Fund (NYSE)), | Close Price Prediction | 8 Lacks (approx.) High Frequency (1-minute data) | 3-January − 2000 to 31-December-2008 (single minute dataset) | Adaptive Trio-ensemble Deep Neural Network (ATDNN) | MSE, RMSE, MAE, RMSLE, MRD, R2 |
NN: Neural Network, SVR: Support Vector Regression, RF: Random Forest, rRMSE: Relative RMSE, NMSE: Normalized MSE, MI: Mutual Information, LSTM: Long Short Term Memory, DM: Diebold and Mariano Testing, RMSE: Root Mean Square Error, PT: Pesaran Timmermann Testing, MAD: Mean Absolute Deviation, MAPE: Mean Absolute Percentage Error, BPNN: Backpropagation Neural Network, CNN: Convolutional Neural Network, ENANFIS: ensemble adaptive neuro-fuzzy inference system, ATDNN: Adaptive trio-Ensemble Deep Neural Network.
This outcomes of this review section is summarizes as follows:
-
High-frequency dataset (collected at every minute or hour) can provide better prediction results than a low-frequency dataset (collected in a day).
-
Ensemble model has better prediction accuracy than any individual Machine Learning model.