Point and interval forecasting for carbon trading price: a case of 8 carbon trading markets in China

Carbon trading price (CTP) prediction accuracy is critical for both market participants and policymakers. As things stand, most previous studies have only focused on one or a few carbon trading markets, implying that the models’ universality is insufficient to be validated. By employing a case study of all carbon trading markets in China, this study proposes a hybrid point and interval CTP forecasting model. First, the Pearson correlation method is used to identify the key influencing factors of CTP. The original CTP data is then decomposed into multiple series using complete ensemble empirical mode decomposition with adaptive noise. Following that, the sample entropy method is used to reconstruct the series to reduce computational time and avoid overdecomposition. Following that, a long short-term memory method optimized by the Adam algorithm is established to achieve the point forecasting of CTP. Finally, the kernel density estimation method is used to predict CTP intervals. On the one hand, the results demonstrate the proposed model’s validity and superiority. The interval prediction model, on the other hand, reflects the uncertainty of market participants’ behavior, which is more practical in the operation of carbon trading markets.


Background and motivation
Carbon markets are regarded as the most effective way to promote carbon emission reduction and address climate problems due to their low cost and sustainability Zhao et al. 2023). According to the International Carbon Action Partnership, there were 24 carbon markets in operation around the world as of January 31, 2021, with another 8 carbon markets in the planning stages and 14 jurisdictions considering the establishment of carbon markets . China, as one of the world's largest carbon emitters (Zheng et al. 2019), established seven carbon trading markets in 2013, namely, Beijing, Shanghai, Guangdong, Tianjin, Shenzhen, Chongqing, and Hubei. Then, Fujian province became China's eighth pilot region for carbon emissions trading in December 2016 (Ding et al. 2019). Competition among market participants is unavoidable during the operation of the carbon trading market. Accurate carbon trading price (CTP) prediction has become a critical factor in the formulation of competition strategies by all market participants. Governments and policymakers can strengthen the supervision and regulate participant behavior in a timely manner in response to changes in CTPs, promoting long-term safe, stable, and orderly development of the carbon market (Huang and He 2020). For enterprises, the results of carbon price predictions can be incorporated into the investment decision-making process, significantly improving the enterprise's intelligent management of carbon assets. Precise CTP forecasting can not only increase revenue but also reduce business risks.

CTP
Several pilot carbon markets have begun operations in Beijing, Tianjin, Shenzhen, Shanghai, and Guangdong since 2013. Each market's economic structural characteristics and resource endowments are very different, which causes its prices to follow different trends. Figure 1 depicts the CTPs of eight carbon markets from 2017 to 2020. Because the Shenzhen carbon market has many different types of transactions, the price of the Shenzhen carbon market is based on the average price of those transactions.
The descriptive statistics of carbon prices in eight carbon markets are shown in Table 1.
Carbon prices in different regions show different trends, as shown in Figure 1 and Table 1.
In terms of carbon price trend, CTP in Shenzhen and Fujian showed an overall downward trend (the average carbon price in Shenzhen decreased from 33.26 yuan in 2017 to 25.34 yuan in 2020, and the average carbon price in Fujian decreased  from 30.07 yuan in 2017 to 13.51 yuan in 2020), while CTP in other carbon markets maintained an upward trend. Except for the steady increase in CTP volatility in Shenzhen, there were no obvious patterns in the volatility of carbon prices in other carbon markets. Carbon prices in Guangdong and Hubei were relatively stable by 2020, with standard deviations of 1.10 and 1.62, respectively. While carbon price fluctuations are visible in Shenzhen and Chongqing, the standard deviation of their carbon prices was 9.82 and 8.64, respectively.
The prediction performance will be affected to some extent by the stability of the data. It is relatively difficult to achieve accurate predictions of carbon prices in carbon markets with large fluctuations in carbon prices.

Literature review
Based on the aforementioned background, CTP forecasting has attracted an increasing amount of research attention in recent years. Sun and Zhang proposed an ELM-AWOA model for the CTP prediction of four pilots in China and EU ETS (Sun and Zhang 2018 (Zhu et al. 2019). Although machine learning methods such as ELM and LSSVR are widely used in CTP prediction, there are still some limitations in processing CTP data with characteristics of large samples and high volatility. Hence, some scholars introduced deep learning algorithm into CTP prediction. Huang et al. predicted the EUA futures price based on CNN model, and the CNN model is proved better performance than other five traditional statistical models and machine learning models (Huang et al. 2022). Yang et al. proposed a LSTM-IWOA model to forecast the CTP of Beijing, Fujian, and Shanghai carbon markets, and the result shows lower forecasting error than the traditional BP and ELM models . Compared with the traditional machine learning models, the LSTM model has unique spatial network structure achieving higher prediction accuracy and forecasting ability. Thus, the LSTM model is also applied in this paper.
In addition to the continuous exploration of prediction models, some scholars also try to reduce the data volatility of CTP. Zhao et al. introduced the HP filter model to decrease the volatility and randomness of CTP data in EU ETS (Zhao et al. 2021). However, the HP filter model is prone to inflict problems of introducing spurious dynamic relationships that are unrelated to the underlying data generation process (Hamilton 2017). Zhu et al. decomposed the nonstationary and nonlinear carbon price of EU ETS by empirical mode decomposition (EMD) approach (Zhu et al. 2017). However, the modal aliasing phenomenon will occur once the intermittent signal appears in the actual signal. The specific manifestation is that multiple scale components exist in one intrinsic mode function (IMF) at the same time or that one scale component exists in multiple IMFs at the same time (Jia et al. 2021). Chen et al. employed the ensemble empirical mode decomposition (EEMD) approach to decompose the CTP data of EU ETS (Chen et al. 2022a). Although the EEMD can effectively reduce the occurrence of mode mixing, it also introduces new challenges. The white Gaussian noise added to the EEMD must be averaged repeatedly, and the error is primarily determined by the number of integrations (Gao et al. 2020). Lu et al. adopted the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) to decompose the CTP data of eight carbon markets in China, which can overcome the aforementioned problem (Lu et al. 2020). However, if the decomposed sequences are directly entered into the prediction model, the computation time will be significantly increased, and the overdecomposition phenomenon will occur on occasion. Therefore, some scholars tried to introduce some data reconstruction models into CTP forecasting (Hao and Tian 2020), such as sample entropy (SE). Considering the advantages and limitations of the above methods, CEEMADN-SE method is used for data decomposition and reconstruction in this paper.
Through the review of existing literature, we found that the carbon price prediction application scenario focuses primarily on the EU ETS or a few carbon markets in China. However, different carbon markets have different operating mechanisms, as well as different price volatility and changing trends. It is difficult to guarantee whether the proposed model can accurately fit carbon prices in other regions if only a few carbon market price data are simulated and analyzed. In addition, the existing literature primarily focuses on carbon price point forecasting. Because market entities' trading behaviors are complex and changeable, it is difficult to reflect the uncertainty of the carbon market by predicting point carbon trading prices. As a result, considering the uncertainty in the carbon trading price, interval prediction is more practical and realistic. There are currently few studies on interval carbon price prediction; however, there are more applications of interval predictions in fields such as metal price, oil price, and wind speed (Du et al. 2020;Ding and Meng 2020).
This study, based on previous research, proposes a novel hybrid system comprised of point and interval carbon trading price forecasting models. First, the influencing factors are classified as relevant or irrelevant using the Pearson correlation method. The initial carbon price data is then decomposed and reconstructed into several series using the CEEMDAN-SE approach. The series are then placed in the LSTM model optimized by the Adam algorithm. Finally, the kernel density estimation (KDE) method is used to forecast the confidence intervals of the CTP. This study's main contributions and innovations are as follows: (1) In this paper, a forecast study is conducted for all of China's carbon trading markets, and the proposed model performs well in most of them. The forecasting outcomes are far more convincing. (2) This paper innovatively proposes a comprehensive CTP prediction model system and framework, covering indicator screening, data decomposition and reconstruction, point prediction and interval prediction. On the one hand, the proposed CTP point prediction model overperforms than other four benchmark models. On the other hand, the interval prediction for CTP is more practical than point prediction, which can be widely used by market participants in carbon market operation.
The remaining sections of this research are organized as follows. Following the introductory section, Section 2 elaborates on the basic methodology used in this research. The established point and interval price prediction model is primarily introduced in Section 3. Section 4 contains the empirical analysis and forecasting preparation. Section 5 introduces the results and discussion of point and interval prediction. The conclusions are obtained in Section 6.

Pearson correlation method
The choice of influencing factors is critical in predicting carbon trading prices. If all of the influencing factors are used as inputs to subsequent prediction models without treatment, the network structure of the model will become more complex, reducing the model's prediction accuracy significantly . The Pearson correlation method is used to determine the correlation coefficient between two variables by calculating the quotient of covariance and standard deviation. Pearson correlation method is widely used in prediction due to its convenience (Kim et al. 2020;Jebli et al. 2021). The Pearson correlation method is shown below: (1) where cov(X, Y) represents the covariance deviation and σX and σY, respectively, represent the standard deviation of the two variables. The value of r(X, Y) ranges from −1 to 1; r(X, Y) = − 1, 1 indicate completely negative and positive correlations and r(X, Y) = 0 indicates no correlation.
Indicators that meet the significant p value requirements can be used as inputs for the prediction model, while indicators that do not meet the significance requirements are discarded (He et al. 2022).

The CEEMDAN approach and SE method
In order to overcome the problem of mode aliasing (EMD) and poor decomposition completeness (EEMD), Torres et al. proposed a CEEMDAN approach in 2011 (Torres et al. 2011). The essence of CEEMDAN method is to add a finite number of times of adaptive white noise at each period, and the CEEMDAN process is depicted below: Step 1: Add a series of adaptive white Gaussian noise to the original sequence x(t).
where ω 0 represents the noise coefficient, ε i (t) represents the i-th addition of white Gaussian noise, x i (t) represents the time series after adding i-th white Gaussian noise, and I represents the number of integrations.
Step 2: Decompose x i (t) by EMD method and take the mean value of the first IMF componentc i 1 .
Eliminate c 1 (t) from the original sequence x(t) to get the first residual sequence.
where E j (⋅) represents the j-th IMF component decomposed by EMD method.
Step 4: Repeat the following procedure to calculate the remaining IMF components. (2) where K represents the total number of modes.
The approach ends when the residual sequence cannot be decomposed further. The final residual can be expressed as follows: However, the overdecomposition problem caused by the decomposition method often reduces the calculation efficiency and prediction accuracy. To solve the aforementioned problem, Pincus proposed the approximate entropy (ApEn) method in 1991. The ApEn method works on the principle of measuring the similarity of two distinct time series and reconstructing the decomposed data into new sequences (Pincus 1991). However, ApEn method relies on the length of the data and the result consistency is not satisfied. Richman and Moorman introduced a new method called SE in 2000 based on ApEn (Richman and Moorman 2000). The SE method's procedure is as follows: Step 1: where the vector sequences represent m consecutive values of x starting from point i.
Step 2: Define and calculate the distance between X m (i) and X m (j).
Step 3: Count the sum number of d[X m (i), X m (j)] < r for every i value and calculate the ratio with N − m + 1 to obtain B m i (r) . Then, calculate the mean value of B m i (r) as follows: Step 4: Increase the dimension to m + 1 and repeat steps 1-3 to get the mean value of B m+1 i (r) Step 5: The SE can be expressed as The value of SE can be obtained as follows when the length of time series is N: According to the parameter settings in existing references, m is set to 2, and r is set in this study at the 0.2 standard deviation range (Wu and Lin 2019). When compared to the traditional ApEn method, the main innovation of the SE method is as follows: (1) The ApEn method is dependent on the length of the data, whereas the SE method is not.
(2) The SE method is more consistent, which means that changes in parameters m and r have the same effect on the method (Sun and Wang 2018).

LSTM method
Deep learning has become increasingly popular in a variety of fields in recent years. It requires very little manual engineering and can easily leverage increases in available computation and data (Lecun et al. 2015). Different deep learning algorithms are used in various fields based on different network architectures. A convolution neural network (CNN), for example, has become a research hotspot in the fields of speech analysis and image recognition, while deep belief network (DBN) is widely used in text detection and emotion recognition. A neural network that models sequence data is known as a recurrent neural network (RNN). Information within its layers or between layers can be transmitted more efficiently in both directions. Each neuron's output in the RNN has a relationship with the previous output, and the nodes between the hidden layers are connected. As a result, RNN has a memory function and is well suited to modeling temporal behavior (Fekri et al. 2021). Figure 2 depicts a typical RNN structure.
The basic structure of RNN can be expressed by the following formula: where h t − 1 represents the previous state, h t represents the new target state, x t represents the current input vector, and f w represents the weight parameter function.
The target value's result has a functional relationship with the current input and the previous moment's result. Although RNN has a memory function, it is easy to cause long-term dependence problems due to its simple hidden layer structure and large time span (Zhao et al. 2020), which results in gradient disappearance or gradient explosions (Duan et al. 2021), so RNN cannot learn too long information.
The LSTM neural network is a type of time RNN that can take into account the recurrent and memory effects of time series data to avoid the problem of long-term dependence (Qiao et al. 2020). The LSTM neural network is made up of a series of recursively connected blocks known as memory blocks (MBs). Each MB is made up of three gates: an input gate, a forget gate, and an output gate. Figure 3 depicts the LSTM structure.
The input gate and the candidate status of memory unit are determined by the input value x t at time t and the hidden layer output h t − 1 at time t − 1. The calculation formulas for the input gate x t and the candidate status of memory unit ĉ t are as follows: where w i is the weight matrix of the input gate i t at time t, w c is the weight matrix of the candidate status ĉ t at time t, and b i , b c are offsets. σ uses sigmoid activation function, tanh represents tanh activation function.
The forget gate is determined by the input value x t at time t and the hidden layer output h t − 1 at time t − 1. The expression of the forget gate is: where w f is the weight matrix of the forget gate f t and b f is the offset.
The memory unit ĉ t adjusts its state c t − 1 and the current candidate memory state value ĉ t through the input gate i t and the forget gate f t to update the memory unit status. The updated calculation formula of the state value of the memory unit c t is The output gate is determined by the input value x t at time t and the hidden layer output h t − 1 at time t − 1. The formula for the output gate o t is as follows: where w 0 is the weight matrix of the output gate at time t and b 0 is the offset.
The output value of the hidden layer is determined by the output value o t and the state value of the memory unit c t at time t. The formula of the hidden layer output value h t is: It can be seen that the three types of LSTM gates perform different functions, allowing for deep learning of historical data. Although there are many different types of deep learning algorithms, some methods, such as classification and recognition, are better suited to other fields due to differences in network structure. The LSTM method can continuously update and adjust the output based on historical data. As a result, LSTM performs better when processing time series data.

Adam algorithm
The gradient is not accurate enough when processing a large amount of data due to the complexity of the LSTM structure, which affects the prediction results. Optimization algorithms that are commonly used include stochastic gradient descent (SGD), adaptive gradient algorithm (AdaGrad), and root mean square prop (RMSprop) (Cheridito et al. 2020;Han et al. 2017;Lee et al. 2021). The Adam algorithm is currently the most popular deep learning optimization method because it not only improves operation efficiency but also consumes less memory (Kingma and Ba 2014). The Adam algorithm is based on adaptive estimation of low-order moments, and the gradient's first and second-order moment estimates are calculated to determine the gradient's learning Step 1: Set the initial learning rate α, the first-order moment vector β 1 , the second-order moment vector β 2 , and the maximum number of iterations t max . Initialize the neural network weight parameter θ 0 , first-order moment estimation m 0 , and second-order moment estimation v 0 ; Step 2: Calculate the gradient of the objective function g t : Step 3: Calculate the first-order moment estimation m t and second-order moment estimation: v t : Step 4: Calculate the first-order moment estimation m t and second-order moment estimation v t after deviation correction: Step 5: Update the learning rate α t and the parameter θ t , repeat the above steps until the maximum number of iterations t max is reached.

KDE method
Traditional interval prediction methods primarily include parametric and nonparametric estimations. Nonparametric estimation methods, as opposed to parametric estimation, do not require the assumption of the error distribution in advance, so the estimation result is closer to the actual value.
The KDE method is one of the most practical nonparametric estimation methods because it does not require prior knowledge of the data distribution ). The following are the steps: Suppose the random variable's density function X is f(x) and the empirical distribution function is F(x), then the simple estimate is: here h is a nonnegative constant. When h → 0, we can obtain the approximate estimate: where N represents the number of samples and k(⋅) represents the kernel function. In this study, the Gaussian kernel function is adopted and can be expressed as follows: On this basis, the confidence interval under a certain confidence level can be calculated. In other words, given a confidence level of 1 − α, if the CTP value satisfies: Then the interval CTP down , CTP up is called the confidence interval of the CTP value, where CTP down and CTP up , respectively, represent the lower and upper limits of the interval.

Point and interval CTP forecasting model
The point and interval CTP forecasting model's detailed procedures are detailed below.
Step 1: Choose a key influencing factor.
Using the Pearson correlation method, compute the correlation efficiency between the influencing factors and the CTP data. Divide the factors into relevant and irrelevant categories based on the Pearson correlation coefficient and significant p value. The latter is fed into the prediction model.
Step 2: Decomposition and reconstruction of the original CTP data series Set the parameters of CEEMDAN method as follows: Maxiterations = 1000, Std = 0.2, and the CTP data will be decomposed into several series. Determine each series' SE value and reconstruct some of them into new series. The prediction model is fed with new series.
Step 3: Configure the LSTM method parameters.
The LSTM model is configured with one layer, 64 hidden neurons, and a maximum number of iterations of 600.
Based on the previous steps, compute the prediction error, and estimate the error confidence interval under a given level of confidence using the KDE method.
The proposed method's framework is demonstrated in Figure 4:

Data normalization
Deep learning methods, such as LSTM, are sensitive to data scale, which means that if the data is not preprocessed, the learning performance suffers greatly. As a result, the Min-Max method is used to normalize and unify the CTP data range to [0, 1]. The formula is as follows:  Fig. 4 The framework of the point and interval carbon trading price forecasting model where x min and x max demonstrate the minimum and maximum values of the training and testing data and x norm represents the normalized data.

Performance evaluation criteria
Point forecasting performance evaluation criteria Table 2 lists the point prediction performance evaluation indicators.
where i denotes the number of samples and y i and ŷ i , respectively, represent the observed value and the predictive value. The three indicators mentioned above are of the minimum type, which means that the smaller the value of the indicator, the better the model's fitting effect.

Interval forecasting performance evaluation criteria
Prediction interval coverage probability (PICP) and prediction interval normalized average width (PINAW) are two commonly used criteria for evaluating interval prediction performance (He and Zheng 2018;Li et al. 2020), as shown below: where n represents the number of samples, L i and U i represent the lower and upper bound, and y i represents the actual value of CTP.
where R represents the target variable range.
However, PICP and PINAW only reflect unilateral evaluation standards, which means they cannot reflect the forecasted results' overall performance. When one indicator performs well while the other underperforms, it appears difficult to assess the model's interval prediction effectiveness. As a result, the following prediction interval comprehensive evaluation criteria (PICEC) are introduced in this study (Khosravi et al. 2011): where η is a hyperparameter determining how much penalty is assigned to prediction intervals with a low coverage probability, μ represents the given confidence level, and PICEC is the minimum type criteria (the lower value, the better performance). When PICP is lower than the given confidence level, an exponential penalty is given. When PICP reaches the given confidence level, PINAW will be the only factor considered.

Relevant influencing factors selecting
A critical step before predicting carbon prices is feature selection. If the factors chosen are not appropriate, the model's fitting effect will suffer, and the prediction's accuracy will be reduced (Hao and Tian 2020). Based on previous research, the forecasting model's input indicators are the international carbon price, the energy market, the stock market, the exchange rate, the environment, and allowances. The carbon market price is determined not only by itself, but also by other carbon markets (Zeng et al. 2021). The price of the EU ETS, as one of the world's largest carbon markets, has a significant impact on the price of China's carbon market (Huang and He 2020). As a result, the European Union carbon emission allowances price (EUAP) has been selected as the international carbon price indicator. Furthermore, other market prices, such as energy and stock market prices, have a significant impact on carbon prices (Gong et al. 2021;Khalfaoui et al. 2022); therefore, Brent oil price (BOP), natural gas price (NGP), and CSI 300 Index are chosen as energy and stock market price indicators, respectively. In addition to the market prices mentioned above, factors such as the exchange rate, the environment, and allowances are commonly used input indicators for carbon price forecasting (Zhao and Hu 2016;Zhou and Li 2019;Batten et al. 2021). As a result, in this study, the exchange rate between EUR and CNY (ER (EUR-CNY)), temperature, humidity, and carbon emission allowances (CEA) are used. Table 3 shows the main factors influencing carbon prices, with data from the Wind database.
The relationship between various influencing factors and CTP is shown in Figure 5 (take Beijing as an example): The Pearson correlation coefficient between influencing factors and CTP of each carbon trading market are listed in Table 4.
Indicators that meet the requirements of a significant p value (p < 0.1) are considered relevant, while others are considered irrelevant. Table 5 lists the relevant and irrelevant indicators.

Data decomposition and reconstruction
The CEEMDAN approach is employed to decompose the original CTP data into several series and reconstruct the sequences based on the SE value of each sequence. Figure 6 depicts the decomposed CTP data of Beijing (as an example), and the SE values of each carbon trading market.
The SE values of each sequence for eight carbon trading markets are listed in Table 6. Carbon emission allowances (CEA) CNY

Fig. 5 The relationship between influencing factors and carbon trading price of Beijing
According to the SE method introduced in Section 2.2, sequences with relatively similar SE values can be recombined into a new sequence. The reconstructed series are listed in Table 7.

Point forecasting for CTP
The case study in this study considered eight carbon trading markets in China, and the CTP data sets are downloaded from the China carbon trading data network (http:// www. tanpa ifang. com/). The time span of eight carbon trading markets ranges from January 7, 2019, to December 13, 2019. The data set is divided into two parts: the first 80% of the data is the training set, and the last 20% of the data is the test set.
Currently, the CTP data disclosed by China's carbon market are mainly divided into four types: opening price, highest price, lowest price, and closing price (Huang et al. 2022). Because the closing price has the function of predicting the deductive direction of the next trading day, most scholars carry out prediction research around the closing price of the carbon market (Zhou et al. 2022;Chen et al. 2022b). Hence, this paper also predicts the closing price of eight carbon pilots in China, and the CTP mentioned in the following parts are defaulted to the closing price of the carbon trading markets. In addition, we found that some carbon pilots were less active, leading to no carbon trading for some time. Referring to previous studies, the CTP in the days without carbon trading are set to the CTP in the last carbon trading day.
Benchmark models such as ANFIS, LSTM, E E M D -L S T M , C E E M D A N -L S T M , a n d   CEEMDAN-SE-LSTM were used to compare and demonstrate the performance of the proposed model. Figure 7 depicts the point prediction results of benchmark models and the proposed model. Furthermore, the performance of each model in eight carbon markets are listed as follows. Table 8 shows the prediction accuracy of the benchmark models and the proposed model based on three criteria from which several conclusions can be drawn: (1) The prediction accuracy of the carbon market prices in Beijing and Chongqing is not so satisfactory. In almost all the models, the carbon price prediction errors in these two markets are significantly larger than the errors in other carbon markets. According to the descriptive statistics of carbon prices in eight carbon markets (Table 1), it was demonstrated that the degree of data volatility significantly affects the accuracy of prediction.  The same color means that the SE values of the corresponding sequences are relatively similar (2) The prediction accuracy of the LSTM is significantly higher than ANFIS model for nearly all the carbon trading markets except Hubei (the MAE, MAPE, and RMSE of ANFIS are 0.707, 2.52%, and 0.986; the MAE, MAPE, and RMSE of LSTM are 0.791, 2.86%, and 1.053). When dealing with time series problems such as carbon price pre-diction, deep learning algorithms have certain advantages over traditional machine learning algorithms in almost all instances. But it is worth noting that this conclusion is not absolute when employing some specific sample data.
(3) For all the carbon trading markets, EEMD-LSTM and CEEMDAN-LSTM improved the accuracy of predic-  -IMF1  NEW-IMF2  NEW-IMF3  NEW-IMF4   Beijing  IMF1&IMF2&IMF3  IMF4&IMF5&IMF6  IMF7&r  /  Shanghai  IMF1&IMF2&IMF3&  IMF4   IMF5  IMF6  IMF7&r   Guangdong IMF1&IMF2&IMF3  IMF4&IMF5  IMF6  r  Tianjin  IMF3  IMF1&IMF2&IMF4&  IMF5&IMF6&IMF7   r  /   Shenzhen  IMF1&IMF2&IMF3  IMF4&IMF5  IMF6  IMF7&r  Hubei  IMF1&IMF2  IMF3  IMF4&IMF5&IMF6  IMF7&r  Chongqing IMF1&IMF2&IMF3  IMF4&IMF5  IMF6&IMF7  r  Fujian  IMF2&IMF3  IMF1  IMF4&IMF5&IMF6&I  This can be attributed to the following: most previous studies (including this study) determine whether the IMFs can be reconstructed by observing the similarity of SE values, which is somewhat subjective. Due to the similarity of SE values, some sequences that should not have been merged are reconstructed, although reducing prediction repetitions, the forecasting accuracy decreased.
Meanwhile, when predicting the carbon prices of Shenzhen and Fujian, the inclusion or exclusion of a SE method greatly impacted the prediction accuracy. This phenomenon may be caused by the overdecomposition of CEEMDAN, affirming the relevance of data reconstruction.
We can find the ranking of the prediction effects for each model by comparing the prediction accuracy of each carbon price prediction model for each carbon market: CEEM-DAN-SE-LSTM > CEEMDAN-LSTM > EEMD-LSTM > LSTM > ANFIS. Although the ANFIS algorithm introduced fuzzy systems into neural networks, it still has the drawbacks of slow convergence and being susceptible to local optimum   (Zhou et al. 2011). In contrast, the output of LSTM neurons can directly act on itself in the next time period, and the mechanism of adding "gates" can effectively memorize long-term information, which outperforms ANFIS when training carbon price prediction problems. However, due to the high volatility and randomness of carbon prices, obtaining satisfactory prediction accuracy is still difficult, even when the LSTM model with better generalization ability is used. By adding white Gaussian noise, decomposition methods such as EEMD can decompose the original data into multiple sequences that are more stable and ordered (Lin et al. 2020), effectively improving the prediction performance of LSTM. The decomposition effect of the EEMD approach, on the other hand, is primarily determined by the number of integrations. CEEMDAN can effectively solve this problem by adding a finite number of times of adaptive white noise, improving the decomposition effect and increasing prediction accuracy . This is also why CEEMDAN-LSTM outperforms EEMD-LSTM in most carbon markets in terms of prediction accuracy. Although the use of CEEMDAN-LSTM can improve carbon price prediction accuracy to some extent, it is also prone to overfitting and long calculation times (Zhou et al. 2022). As a result, the sequence reconstruction model based on the SE method is used to solve the aforementioned issues. It can be seen that the use or nonuse of the SE method has a significant impact on the prediction accuracy when predicting the carbon price of Shenzhen and Fujian. This phenomenon could be caused by the overdecomposition of CEEMDAN, demonstrating the importance of a data reconstruction method.
Meanwhile, we discovered that while the CEEM-DAN-SE-LSTM model proposed in this study does not have the best performance in all markets, the CEEM-DAN-LSTM model appears to perform better for Hubei carbon market price prediction. The following could be the cause of this phenomenon: most previous studies (including this study) determine whether the IMFs can be reconstructed by comparing SE values, which is also subjective to some extent. Because of the similarity of SE values, some sequences that should not have been merged are reconstructed together, reducing prediction repetitions but decreasing forecasting accuracy. To confirm this conjecture, we calculated the standard deviation and average value of each CTP series in Hubei before and after reconstruction, as shown in Table 9.
As shown in Table 9, although the reconstruction algorithm can reduce the number of sequences and improve the prediction efficiency, it increases the volatility of the sequences to a certain extent. Before reconstruction, the standard deviation of the residual sequence is 0.6454, while after reconstruction, the standard deviation of NEW-IMF4 composed of IMF7 and residual sequence reaches 4.1707.
However, the average value of NEW-IMF4 sequence is 32.6145, which almost accounts for the majority of the original carbon price data. For such an important sequence, the significant rise of its volatility will undoubtedly bring about a decline in prediction accuracy. For carbon pilots that also combine the residual sequence and IMF components, such as Shanghai, the standard deviation only rose from 3.0822 to 3.4035. This is why CEEMDAN-SE-LSTM performs better than CEEMDAN-LSTM in other carbon pilots.

Sensitivity check
This section performs a sensitivity test to validate the impact of index screening on carbon price prediction. The main idea is to compare the model's prediction results in two cases. One is a carbon price forecasting model that only takes into account the relevant influencing factors, and the other is a carbon price forecasting model that takes into account all of the influencing factors (without index screening). Due to space constraints, the sensitivity analysis is limited to the CEEMDAN-SE-LSTM model.
There is no difference between before and after index screening of CTP in Beijing and Guangdong because all of the main influencing factors of carbon trading price in Beijing and Guangdong are relevant influencing factors ( Table 5). As a result, Figure 8 only compares predicted and actual values in the other seven carbon trading markets. Figure 8 shows that when compared to CEEM-DAN-SE-LSTM without index screening, the numerical deviation of CEEMDAN-SE-LSTM is smaller and within an acceptable range. The sensitivity test demonstrates the index screening method's relevance and effectiveness.

Interval forecasting for CTP
According to the previous point prediction results, it is clear that the proposed model outperforms all of the other point forecasting models mentioned above (such as ANFIS, LSTM, EEMD-LSTM, CEEMDAN-LSTM, CEEM-DAN-SE-LSTM), and interval forecasting is based on the proposed point forecasting model.
To verify the effectiveness of the KDE method in interval estimation, the models compared in this study are normal distribution estimation (NDE) of the parametric estimation method and Bootstrap of the nonparametric estimation method (Perera and Silvapulle 2020). Figure 9 depicts the interval prediction results of CTP data for three methods.
As an example, consider "the interval prediction performance based on traditional evaluation criteria at the 90% confidence level (Table 11)  Guangdong, Chongqing, and Fujian), while NDE outperforms Bootstrap in one carbon trading market (Shenzhen) (Shanghai, Tianjin, and Hubei).
(4) In the point prediction results, the CEEMDN-SE-LSTM method outperforms almost all other prediction models in terms of prediction accuracy. Meanwhile, the CEEMDAN-SE-LSTM method can fit most carbon market price data well. However, the phenomenon differs in terms of interval prediction. Although the KDE method performs reasonably well across all confidence intervals, it does not have the same "extremely outstanding performance" as the point prediction part. The NDE method's interval prediction performance is also not bad.

Conclusions
This study introduces a novel hybrid CTP prediction model that combines key influencing factor selection, data decomposition and reconstruction, point and interval prediction, and optimization. The proposed model's validity and superiority are demonstrated through a case study of China's eight carbon trading markets. This study's main contribution can be listed as follows.
(1) Because different carbon trading markets have different characteristics, the conclusion drawn from research on a single subject is not representative. This study is aimed at all of China's carbon trading markets, and the proposed model demonstrates strong adaptability in different datasets.
(2) Based on the consideration of multiple influencing factors, an index screening method is proposed that improves prediction model accuracy while decreasing model complexity. (3) The CEEMDAN method decomposes the original carbon price data into multiple series with lower volatility, and the SE method reconstructs the series into new sequences, which reduces computing time and improves prediction accuracy. (4) The LSTM method has a more complex spatial structure and performs well in dealing with nonlinear carbon price prediction problems. (5) The uncertainty in the behavior of carbon trading market participants is converted into an interval prediction of carbon price. The proposed KDE model performs better in terms of prediction, and interval prediction is more practical and realistic. Furthermore, some limitations and future research directions are discussed: (1) the data time scale for CTP analysis in this study is short, but it can be extended in a future study.
(2) With the establishment of China's unified carbon market, further carbon price prediction in China's unified carbon market is possible.