2.1 Literature Review
From a research perspective, China has become a world leader in AI publications and patents (Li, Tong & Xiao 2021). Reviews of China’s research in ML algorithms used in industrial applications will assist researchers and practitioners in understanding the current situation of ML approaches. The literature reviews the top-tier publications on ML algorithms used in China’s industrial applications between 2016 and 2020.
29 algorithms are found in 347 industrial applications. They include Back-Propagation (BP) (27.38%, 95 out of 347), Support Vector Machine (SVM)(24.50%, 85 out of 347), Linear Regression (LR) (8.65%, 30 out of 347), Perceptron (5.19%, 18 out of 347), Recurrent Neural Networks (RNN) (4.90%, 17 out of 347), Random Forest (RF) (3.75%, 13 out of 347), Convolutional Neural Networks (CNN) (3.17%, 11 out of 347), K-means (3.17%, 11 out of 347), AdaBoost (2.88%, 10 out of 347), Bayesian Network (2.59%, 9 out of 347), K-Nearest Neighbour (KNN) (2.02%, 7 out of 347), Stepwise Regression (1.44% , 5 out of 347), Naive Bayes (1.44%, 5 out of 347), Self-Organizing Map (SOM) (1.15%, 4 out of 347), Partial Least Squares Regression (PLSR) (1.15%, 4 out of 347), Logistic Regression (1.15%, 4 out of 347), Learning Vector Quantization (LVQ) (0.86%, 3 out of 347), Classification And Regression Tree (CART) (0.86%, 3 out of 347), Hierarchical Clustering(0.58%, 2 out of 347), C4.5 (0.58%, 2 out of 347), Radial Basis Function Networks (RBFN) (0.29%, 1 out of 347), Locally Weighted Learning (LWL) (0.29%, 1 out of 347), Projection pursuit (0.29%, 1 out of 347), Principal Component Regression (PCR) (0.29%, 1 out of 347), Partial least squares discriminant analysis (PLS) (0.29%, 1 out of 347), Linear Discriminant Analysis (LDA) (0.29%, 1 out of 347), Gradient Boosted Regression Trees (GBRT) (0.29%, 1 out of 347), Expectation Maximization (0.29%, 1 out of 347), and Ridge Regression (0.29%, 1 out of 347) (see Appendix 1).
Among the above algorithms, nine have been discussed by more than ten publications, including BP, SVM, LR, Perceptron, RNN, RF, CNN, K-means, and AdaBoost (see Figure 1).
2.2 Most-used ML Algorithms
Among the nine most-used ML algorithms, AdaBoost is used for classification and regression tasks (EL Bilali et al. 2021, p.2). The classification method needs a proper training mechanism to be well-applied for prediction tasks. CNN utilizes a convolutional layer to detect patterns in input data for classification or prediction (Readshaw & Giani 2021, p.17354). It is usually used for image processing applications. K-means algorithm works for partitioning the data into the set of clusters defined by centroids and starts with initial estimates for the centroids. These estimates are randomly generated from the datasets (Srikanth, Zahoor Ul Huq & Siva Kumar 2022, p.5). Thus, AdaBoost, CNN, and K-means algorithms are unsuitable for applying to gas warning system applications in this research. They will not be tested in this research.
Besides the above wide-used ML algorithms in China’s research, ARIMA is also a popularly used algorithm in international research (Brownlee 2018), which is a common approach used for addressing short-term prediction problems in many studies (Kück & Freitag 2021, p.2; Pu et al. 2021, p.38). ARIMA can account for underlying trends, autocorrelation, and seasonality and allows for flexible modeling of different types of impacts (Schaffer, Dobbins & Pearson 2021, p.11). However, ARIMA cannot effectively capture all the details in very short-term forecasting (Aasim, Singh & Mohapatra 2019, p.766).
BP is one of the most widely used neural networks developed originally for networks of neuron-like units (Rumelhart, Hinton & Williams 1986, p.533). Because of its simple structure, BP can effectively solve the approximation problem of nonlinear objective functions, such as system simulation, function fitting, pattern recognition, and other fields (Huang et al. 2020, p.5645). This research tests BP_Resilient and Second Order Gradient BP (BP_SOG) as training algorithms in this research. The main reason should be that when the large network topology is selected, the standard BP algorithms have problems, such as getting trapped in a local minimum and slow convergence due to the gradients with atomic magnitude (Erkaymaz 2020, p.16279). It is also believed that BP_Resilient has relatively high accuracy, robustness, and convergence speed (Sui Kim et al. 2020, p.15).
Although KNN is only adopted in a few of China’s industrial applications (2.02%, 7 out of 347), this research prefers to test its performance. The main reason should be that KNN is simplistic in its workings and calculations (Uddin et al. 2022, p.2). KNN can bypass the complex equation-solving process with computational efficiency (Dong, Ma & Fu 2021, p.4; Kück & Freitag 2021, p.19) and efficiently work on forecasting accuracy in a wider variety of datasets (Kück & Freitag 2021, p.19; Uddin et al. 2022, p.2)- sometimes without any loss of accuracy (Cunningham & Delany 2022, p.22). As a non-parametric and supervised learning classifier, KNN uses proximity to make classifications or predictions about the grouping of an individual data point (Dritsas & Trigka 2022, p.11) and focuses on the correlation by utilizing raw data characteristics (Dong, Ma & Fu 2021, p.4). It has been widely used in forecasting applications of economics, finance, production, and natural systems (Kück & Freitag 2021, pp.2-3).
LR comes under the supervised Learning technique and is one of the most fundamental algorithms in statistics and ML-related fields (Alhakamy et al. 2023, p.2; Dritsas & Trigka 2022, p.10; Mazumder & Wang 2023, p.1226). The mathematical infrastructure of LR is not complex (Şahin et al. 2023, p.4906). Therefore, it is a powerful tool for various tasks in computer vision (Li et al. 2023, p.732) and widely used for predicting and estimating the categorical dependent variable using a given set of independent features (Alhakamy et al. 2023, p.2; Dritsas & Trigka 2022, p.10).
LSTM is also tested in this research. The reason is that although it has not been reported in China’s industrial applications until 2020, it is well-known for text classification (Butt et al. 2023, p.3040) and has more frequently been used for forecasting than other algorithms (Elsaraiti & Merabet 2021, p.15). LSTM is a special kind of RNN (Butt et al. 2023, p.3055; Mahmoud et al. 2022, p.405; Van Houdt, Mosquera & Nápoles 2020, p.5931), which may overcome the exploding/vanishing gradient problems that typically arise when learning long-term dependencies, even when the minimal time lags are very long (Sherstinsky 2020, p.12; Van Houdt, Mosquera & Nápoles 2020, p.5931).
Perceptron is one of the most straightforward ANN architectures (Sharma, Kim & Gupta 2022, p.5) and the most typical type of neural predictive network (Moayedi et al. 2021, p.4). Perceptron is designed to approximate any continuous function and can use any arbitrary activation function (Dritsas & Trigka 2022, p.11). It can solve problems that are not linearly separable (Cabeza-Ramírez et al. 2022, p.13; Dritsas & Trigka 2022, p.11) and be applied to produce efficient solutions to problems of overwhelming complexity for conventional computing methods such as holes through edges and through points (Calude, Heidari & Sifakis 2023, p.844; Li et al. 2022, p.2).
RF is a tree-based algorithm based on the creation of several decision trees that belong to the supervised learning technique (Dritsas & Trigka 2022, p.11; Mahmoud et al. 2022, p.404; Pacheco et al. 2021, p.7). RF is used in classification and regression problems (Dritsas & Trigka 2022, p.11) and for addressing short-term prediction problems (Pu et al. 2021, p.38). RF has the robustness to data of any distribution from a large number of features and could ascertain non-linear effects and complex interactions without prior specification (Smithies et al. 2021, p.2, p.8), which may obtain a more accurate and stable forecast (Pacheco et al. 2021, p.7). One limitation when considering the power of RF is that feature measures can show bias when features are correlated (Smithies et al. 2021, p.8).
RNN is one of the most powerful algorithms for processing sequential data such as time series (Elsaraiti & Merabet 2021, p.2; Wang et al. 2022, p.463). It can predict future multiple-time steps (Wang et al. 2022, p.463) and is considered a competing alternative to forecasting time series (Šestanović & Arnerić 2021, p.10).
SVM is one of the most practical parts of statistical learning theories (Huang et al. 2020, p.5465) and is primarily used for classification problems (Dritsas & Trigka 2022, p.10). Recent studies show SVM is capable of producing predictions of high accuracy (Essam et al. 2022, p.3884), which creates the best line or decision boundary that can segregate n-dimensional space into classes so that it can be easily put the new data point in the correct category in the future (Dritsas & Trigka 2022, p.10). However, the SVM’s predictive ability is negatively affected when the utilized data set is significantly noisy, as SVMs are sensitive to noise (Essam et al. 2022, p.2).
Thus, this research focuses on ten algorithms, including ARIMA, BP_Resilient, BP_SOG, KNN, LP, LSTM, Perceptron, RF, RNN, and SVM.