Optimized Extreme Learning Machine (ELM) Based on Genetic Algorithm (GA) To Predict Carbon Price Under The Inuence of Multiple Factors

: The promotion of carbon market can accelerate the pace of low-carbon transformation 12 of China's economic structure and achieve more efficient carbon emission reduction. Accurate 13 carbon price prediction is conducive to improving the risk management of carbon market and the 14 decision-making of investors, but it also brings great challenges to relevant industry practitioners 15 and the government. In this paper, a new hybrid model is proposed, which combines complete 16 ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and genetic algorithm 17 (GA) optimized extreme learning machine (ELM). The application of GA-ELM in carbon price 18 prediction is firstly studied in this paper. Eight intrinsic mode functions and one residual can be 19 obtained by CEEMDAN, and then partial autocorrelation (PACF) is used to determine the partial 20 correlation between each sequence and its lag data, and they were taken as internal factors affecting the prediction. At the same time, energy, economic and social factors are selected as the external 22 factors affecting the prediction, and the carbon price prediction is realized through internal and 23 external factors. It has been proved that the model successfully overcomes the challenge of carbon 24 price prediction based on multiple influencing factors. The hybrid model shows superiority in 25 Beijing, Shanghai and Guangdong. The results show that the prediction performance of the 26 proposed model is the best among the 15 models, and the prediction accuracy will be improved 27 due to the decomposition of the carbon price. Besides, the CEEMDAN-GA-ELM model better 28 overcomes the challenge of carbon price prediction with multiple influencing factors. This model 29 provides a novel and effective tool for the government and enterprises to predict the carbon price.


78
(Liu and Shen, 2020) used fuzzy C-means clustering algorithm to divide these sub-components 79 into trend, low frequency and high-frequency components for prediction to improve the prediction Different influencing factors may lead to different prediction effects, so it is very important 110 to select the appropriate influencing factors and deal with them, which also increases the difficulty 111 of research (Zhu et al., 2017). In the prediction research based on impact factor analysis, the grey 112 correlation method can be used to screen out the factors with high correlation with the explained 113 variables, while factor analysis is often used to deal with the situation of high data repeatability, 114 which can well reduce the data dimension. (Zhu et al., 2021)   The factor analysis method is also used (Sun and Wang, 2020), which takes the extracted special 121 factors as input variables and uses the least square support vector machine improved particle 122 swarm optimization to make predictions. From these studies, it is found that factor analysis and 123 grey correlation can well screen out the target factors, and reduce the redundancy of data, to reduce 124 the difficulty of prediction and improve the accuracy of prediction. This is also an important reason 125 why they are used in this paper to select the prediction input.

126
CEEMDAN is an advanced data denoising method (Lu et al., 2020). When separating 127 Electroencephalogram (EGG) data, (Wu et al., 2021) found that CEEMDAN solved the modal 128 aliasing problem while retaining most of the original EEG signal components, and the results 129 showed that the separation effect of this method on EEG artefacts was better than previous studies. 8 carbon price.

154
Given the shortcomings of existing researches, this paper chooses three aspects of energy, 155 economy and society to study the effect of the CEEMDAN-GA-ELM model in carbon price 156 prediction under the effect of these factors. The abbreviations used in this paper are explained in 157   Table 1 158 The innovation and contribution of this paper are mainly reflected in the following aspects:

159
(1) The carbon price prediction model based on influence factor analysis and CEEMDAN-GA-160 ELM model is proposed for the first time. The ELM optimized by GA can improve the stability 161 of the prediction effect and the accuracy of the prediction. The model has been tested in Beijing, 162 Guangdong and Shanghai and proved to have the best prediction effect. The application of the 163 CEEMDAN-GA-ELM model in carbon price prediction can overcome the difficulties caused 164 by many factors, and it has good applicability and can be extended to other pilot projects.

165
(2) The decomposition of carbon price series by CEEMDAN can improve the accuracy of 166 prediction. In the Beijing pilot, the decomposition effect of CEEDAN is better than that of 167 EEMD. When combined with different prediction methods, the prediction effect can be 168 improved very well, among which, when combined with GA-ELM, the best prediction effect 169 can be achieved. This study enriches the practice of the CEEMDAN algorithm and can provide 170 a valuable reference for the research of decompose-prediction.

171
(3) Combine the internal and external influencing factors to realize the prediction of the carbon 172 price. PACF was used in this paper to determine the partial correlation between each sequence 173 decomposed by CEEMDAN and its lag data, which was used as an internal factor affecting the  The rest structure of this paper is as follows: The theories and methods used in this paper are 184 introduced in Section 2. The construction process of the hybrid model is explained in section 3.

185
The empirical research and the analysis of the results are realized in the fourth section. The 186 research conclusion of this paper is given in section 5. The implementation steps of CEEMDAN for signal ( ) are as follows.

203
(1) Generate signal sets containing noise (2) The first-order ( 1 ) of each sample was obtained by EMD on ( ), and then its mean 208 value was calculated as the first-order of ( ). The final residual is as follows: The signal can be expressed as: The CEEMDAN has different representations of the original signal frequency components of 227 each order IMF component. The energy difference among the components of traditional EEMD is 228 large, and the frequency mixing region is large. However, the energy balance among IMF 229 components obtained by the CEEMDAN method has a narrow frequency aliasing region, and 230 different frequency components have higher resolution ability for non-stationary signals. in the hidden layer is Sigmoid, and its expression is as follows: In the formula, = [ 1 , 2 , … , ] is the correlation weight between the input layer and 240 the hidden layer; = [ 1 , 2 , … , ] is the correlation weight of hidden layer and output layer; 241 is the bias of the jth node in the neuron nodes of the hidden layer.

242
According to the zero-error approximation principle, there are , and , so that the 243 standardized form is simplified as =

244
( 1 , … , , 1 , … , , 1 , … , ) = [ Where is the output matrix. Once the input weight and the bias of the hidden layer 247 are randomly determined, the output matrix of the hidden layer is uniquely determined.  Since the hidden layer input weights and deviations in the ELM model are randomly given, there 257 may be some random set values of 0, leading to the failure of some hidden layer nodes. Therefore, 258 this paper adopts the genetic algorithm to optimize the input weights and deviations, and obtains 259 the optimal initial weights and thresholds through the selection, crossover and mutation operations 260 of the genetic algorithm, and then obtains the optimal ELM model.

261
The training steps are as follows:

262
(1) The fitness function, population number k and evolutionary times p were set. In this paper, the 263 mean square deviation of the sample data of the test set was selected as the fitness function.

264
The smaller the fitness function value, the more accurate the model.

276
(4) The optimal fitness function is solved globally.

277
After each solution to the optimal fitness function, adopting the crossover and mutation to Step 1: Determination of correlation, represented by an orange wireframe in the graphic 286 summary. GRA is used to analyze the correlation between energy, economic and social 287 factors and carbon price.

288
Step 2: Data preprocessing, represented by purple wireframe in the graphic abstract. Stock Step 3: Factor analysis, shown in yellow in the graphic summary. Too many input ends of 295 prediction will affect the prediction accuracy. Factor analysis is used to reduce the 296 dimension of social factors to reduce the redundancy and repeatability of information.

297
Use the new factor to replace the original factor as one of the external influence factors.

298
Step 4: Decomposition of the carbon price, which is represented in blue in the graphic abstract.

299
To reduce the difficulty of prediction, CEEMDAN is used to decompose the carbon 300 price, and finally, 8 internal module functions and residuals can be obtained.

301
Step 5: Select the predicted input and indicate it with a black wire box in the graphic summary.
PACF was used to determine the relationship between each internal model function 303 and historical data, and data highly correlated with the analysis sequence was selected 304 as internal input for impact prediction.

305
Step 6: Forecast, shown in green in the graphic summary. GA-ELM was used to predict the    Table 2 is all the data sources used in this article, Table 3 shows the carbon price data By extracting some common factors to replace the original indicators, factor analysis can 368 reduce the redundancy of influencing factors and achieve the purpose of dimension reduction. The 369 extraction of common factors can effectively reduce the prediction error (Mingxing et al., 2009).

370
The characteristics of high information dimension, high redundancy and high repeatability will be 371 brought about by the selected nine factors in this paper. Therefore, we use factor analysis to reduce 372 the dimension of relevant data. This paper will conduct a factor analysis of the five Baidu search can be carried out. Table 5 shows the results of the KMO and Bartlett tests. Table 6 shows the 380 results of the factor analysis.

381
Factor analysis by SPSS shows that KMO=0.774 and significance level is less than 0.01, The resulting and will be used as one of the prediction inputs.   Table 8 and Figure 4 show the prediction results of all models in Beijing pilot project, 418 and Table 9 shows the forecast results of Guangdong and Shanghai 419 Through the prediction results, we can find that:

422
(2) Optimization of BP and ELM can improve prediction accuracy. GA-ELM is better than  prediction accuracy but also a more stable prediction effect.