A novel hierarchical carbon price forecasting model with local and overall perspectives

13 Existing carbon price decomposition methods make effective predictions, 14 promote energy saving and emission reduction, and play an increasingly important 15 role in carbon trading platforms. However, few studies have been conducted on the 16 reorganization methods and different perspective treatments of the decomposition 17 components. In this paper, a new component fusion method is introduced, based on 18 this, a hierarchical carbon price prediction model with two levels — one for a local 19 perspective and one for an overall one — is developed. Firstly, the carbon price data 20 are decomposed and the resulting components are subjected to deviation sample 21 entropy fusion, which classifies them into high, medium, and low frequencies 22 according to the physical significance of the entropy values. Next, fine-grained 23 predictions are performed for the high, medium and low frequency components,


Introduction 1.Background description
We are aware that China, a nation that is quickly developing, has undergone substantial reforms and significant changes.China's development trend is steadily improving as it transitions from an industrial to an urban economy, but the pollution this causes cannot be understated.Statistics show that China's carbon emissions climbed more than twice as much in the last three decades as the combined emissions of all developed nations worldwide [1].It is crucial to address pressing environmental problems, put green and low-carbon ideas into practice, and quickly reach carbon neutrality [2,3].Carbon price plays an indispensable role as a key indication.To facilitate a smooth and organized carbon trading market, this study attempts to provide an efficient fusion approach and carbon price forecast model.An increasing number of academics are taking on the challenge due to the difficulties in valuing the carbon trading market.

Literature review
The prediction methods mainly include traditional statistical methods and machine learning methods.Due to the inherent limitations of traditional statistical methods, traditional methods, such as the auto-regressive integrated moving average (ARIMA) [4] and the generalized autoregressive conditional heteroskedasticity (GARCH) [5], are hard to make reliable predictions for non-linear and non-stationary data [6].
Today's hottest topic, deep learning, is highly regarded for mining nonlinear data [7,8].Yahsi [9] first experimented with artificial neural networks (ANN) and discovered that complex data cannot be accurately predicted by a single artificial neural network.Fan [10] discovered that the machine language program's (MLP) capacity for generalization is inadequate for complex data.It is still challenging to achieve accurate data evaluation using a single model [7,11].A new hybrid method of group method of data handling (GMDH) and least squares support vector machine (LS-SVM) was put forth by Zhu [12], who discovered that it performed significantly better than a single model.Ji [13] suggested an ARIMA-CNN-LSTM model for forecasting, employing long short-term memory (LSTM) to establish long-term reliance and ARIMA to capture stratification, verifying the combined model's successful performance in time series forecasting.Along with combining models to improve performance, decomposition-based combinatorial models are also widely applied [14].The decomposition method effectively reduces the effect of noise on prediction while allowing for the extraction of more features without sacrificing accuracy.Singular value decomposition was used by Sun [15] to remove the highfrequency portion of the data, and an extreme learning machine (ELM) was then used to successfully predict the approximate and detailed sequences.Zhu [16] found EMD to be a promising prediction method while using EMD-GA-ANN to forecast carbon prices.Later, he [17] conducted a thorough analysis of the method and combined it with a nonlinear model to show that the decomposition model could enhance the prediction.Wang [18] increased the prediction accuracy by using the bidirectional gate recurrent unit (BiLSTM) and gaussian process regression(GPR) in addition to CEEMDAN-WT for feature selection.This exactly demonstrates the broadly and applicable decomposition method.Based on this, Li [19] combined complementary ensemble empirical mode decomposition(CEEMD) and variational mode decomposition (VMD) to find a solution to the highly complex and unpredictable problem.By merging the secondary decomposition of CEEMDAN and VMD, Wang [20] created a CEEMDAN-SE-LSTM-RF model, which is a significant advancement over the prior models and significantly increases the viability of prediction.By combining SE and hybrid secondary decomposition, Zhou [21] raised the decomposition framework's prediction accuracy.The secondary decomposition technique was also tried by Jiang [22].Widespread attention has been given to it in the sectors of wind power, wind speed, and energy deviation pricing, demonstrating the accuracy and viability of its prediction [2,19,23].

Summary and Contribution
The decomposition method can increase prediction accuracy, but as can be seen from the description above, it still has significant limitations.Few studies have been conducted on the reorganization methods and different perspective treatments of the decomposition components.Specifically, each part of the modeling has the following shortcomings.First, modeling each subsequence of the decomposition independently makes building the model more challenging and increases calculation time, and cumulative error.Second, the connections between each element are disregarded.
Third, the part where the subseries are unpredictable still includes some information.
Fourth, the approach of restructuring is based on the experience of scholars.
Therefore based on the above research, this paper proposes a novel hierarchical carbon price forecasting model with deviation sample entropy fusion and decomposition error correction.
(1) In this paper, a deviation sample entropy fusion method is proposed to solve the problem that the component reorganization scheme is difficult to determine.The proposed method is applied to Guangzhou, Tianjin and Beijing trading markets, and the best recombination scheme can be selected in all of them.
(2) Two hierarchical relationships are established to study the local and overall perspectives respectively.In the local perspective, different frequency components are modeled separately to bring out the differences between different components.In the overall perspective, the components resulting from the secondary decomposition are modeled as a whole to strengthen the connection between the components.The error is used as a link between the two hierarchical .
(3) The model proposed in this paper achieves better results compared to the traditional quadratic decomposition model, and the robustness of the model is demonstrated by comparing it in experiments in three markets.
The background and theory of CEEMDAN, VMD, SE LSTM, and GRU are explained in Section 2, which also details the framework's specific implementation in this paper.In Section 3, data analysis, model evaluation metrics, the specific deviation sample entropy fusion method, and data pretreatment are all covered.In Section 4, the application of the deviation sample entropy fusion method is presented.The prediction outcomes of several hybrid model combinations are also described, along with comparisons to other models and data.Section 5 provides a summary of the complete study.

Complete ensemble empirical mode decomposition with adaptive noise analysis (CEEMDAN)
Following the development of EMD, EEMD, and CEEMD [24][25][26], TorresME [27] proposed complete ensemble empirical mode decomposition with adaptive noise(CEEMDAN), which breaks down the original data into components with various frequencies by incorporating adaptive noise to get rid of the modal blending caused by EMD [28].The purpose of decomposition is to remove noise and obtain more features.The decomposed data consists of components and residuals, defined as follows: where k is the number of decompositions, k IMF ( ) t denotes the intrinsic mode function, and (t) k r denotes the residual.The specific steps of CEEMDAN are as follows.
Step 1: The signal to be decomposed is combined with K white noise sequences of mean zero to form the K sequences to be decomposed ()  : Gaussian white noise signal satisfying normal distribution.i: Number of times to add white noise.ε: Weight coefficient for white noise standard table.y(t): Signal to be decomposed.
Step 2: When EMD decomposes these sequences i y (t) , the first modal component is made up of the mean values of the subsequent k modal components, then removing the first component produces a residual sequence.
E(*) denotes the component after the EMD decomposition of the sequence.

() IMF t
represents the first modal component obtained after CEEMDAN decomposition. 1 () rt is the value that remains after the first component has been removed.
Step 3: The new signal is obtained by adding Gaussian white noise to the 1 () rt.
The second eigenmode component of the CEEMDAN is obtained by performing EMD decomposition based on this.
Step 4: The original signal y(t) is decomposed after repeating the steps above until the obtained residual series signal's monotonic function can no longer be decomposed.

Sample entropy(SE)
A time series' complexity can be determined using sample entropy.Richman [29] put out the idea in 2000.This is done by calculating the likelihood that new patterns will emerge when the complexity and dimension of the time series vary.The series' temporal complexity increases along with its likelihood of producing novel patterns, and its autocorrelation decreases as its entropy value increases.It has been widely used in the processing of time series because of its high consistency.
The specific steps of sample entropy are as follows: Suppose a time series consisting of N data ( )   ( ) ( ) ( ) Step1: The original sequence is divided into m dimensions and N-m+1 subsequences, which m X (i) { ( ), ( 1)... ( 1)},1 Step 2: By calculating the distance between each sequence and record the Step 3: The similarity measure r is defined.Each row has a sequence other than itself, and the number of values greater than r is recorded as i B .Then calculate  ,  is its average.

3 Variational modal decomposition (VMD)
Konstantin Dragomiretskiy [30] introduced variable mode decomposition as an adaptive and entirely non-recursive approach for mode decomposition and signal processing in 2014.The center frequency and bandwidth of each decomposition are found via an iterative search for the best solution of the variational modes in VMD, as opposed to CEEMDAN.It can process series with significant nonlinearity and great complexity.
The specific steps of VMD are as follows: (1) The Hilbert-Huang transform is used to obtain the k Vt ( ) analysis signal, and its one-sided spectrum is determined.The center band k Vt ( ) is multiplied with the operator -j e kt w and modulated to the proper baseband.
(2) The square parameter of the demodulation gradient is used to compute the bandwidth of each mode component. .
Where represents the decomposed component and represents the central frequency of each component.

   
Two variables need to be introduced to find the optimal solution to constrained variational problems: the Lagrange multiplier and second order penalty factor .Converting constrained variational problems to unconstrained variational problems, where  can ensure the accuracy of signal reconstruction in a Gaussian noise environment.The Lagrange multiplier can ensure the strictness of the constraint conditions.The extended Lagrange expression is as follows: (3) The center frequency of each component is then continuously updated using the alternating direction multiplier approach until the model's ideal solution is attained.
The frequency domain space allows all components to achieve the following connection [31].
 represents frequency in the upper.Fourier transform of corresponding to respectively.
(4) is the residual amount of after Wiener filtering.The algorithm recalculation the center frequency according to the power spectrum of each component.Initialize and n.Second execution period: n = n + 1, when  > 0, and are finally updated according to Equation .
(5) The above steps are repeated until the termination condition of the iteration is satisfied.

Long short-term memory (LSTM)
The widely used LSTM was developed by Hochreiter& Schmidhuber [32].The LSTM can be trained to remember only the data that is useful for making predictions and to discard the rest.The LSTM module includes four interaction layers-three sigmoid layers and one tanh layer-that interact differently than RNN and DNN [33,34].
In contrast to the tanh activation function, the sigmoid activation function compresses the values to 0~1 rather than -1~1.In this situation, information can be updated or forgotten, when a number is multiplied by 0, its information is lost, but when it is multiplied by 1, it is kept.Throughout the procedure, the united state is utilized.Through carefully planned gates, the LSTM adds or subtracts information from the united state.The forgetting gate, the input gate, and the output gate are the three gates that make up the LSTM.The LSTM function has the following representation: 11 LSTM( , , ) Where t x x denotes the value of the input sequence at time-step t .The first stage of the LSTM involves selecting which data should be eliminated from the cell state, and this decision is made using an oblivion gate.The forgetting gate produces a vector from the input and output of the current moment after creating a sigmoid nonlinear mapping.The vector is multiplied by the cell state at the end and each dimension has a value between 0 and 1, with 0 being ignored and 1 is maintained.

( [ , ]
) The weights f w in the above equation are not shared and are simply multiplied by different weights. 1 t h − indicates the output of the previous moment.t x indicates the input at the current moment. denotes the sigmoid function.
In the first stage, the input gate of the sigmoid is used to determine which values need to be updated, followed by the creation of new candidate values to be added to the state using the tanh layer, to determine what new information is stored in the cell state.

( [ , ]
) ) Where t i represents the output result of the input gate, and i w , c w represent the weight not shared.t C indicates a candidate value.
After completing the aforementioned two processes, we need to update the previous cell state, update to , multiply the old state with , remove the unnecessary data, and then add the candidate values to the cell state.
The output gate is the last step before the cell state is output.Layers in the output gate called sigmoid and tanh are present.After performing cell state tanh processing and multiplying the output result by the Sigmoid layer's output result, the part of the output that was determined is ultimately obtained.

( [ , ]
) * tanh( ) Where represents the output of the sigmoid, is the final output.

Gated recurrent unit (GRU)
GRU is an LSTM variant that mainly alters two LSTM components.The updated gate and reset gate are the new names for the three original gates.The second option is to use the initial cell state as the current moment's output [35].The GRU function has the following representation: Where t x denotes the value of the input sequence at time step t .  ( )   ( )   ( )   First of all, it is important to note that the phase space reconstruction is not done on the original carbon price directly, rather is built after decomposition.Second, the data creation for the fine-grained prediction and the integrated prediction are different [36].Assume that the fourth day's data will be predicted using the first three days' worth of data.Each subsequence will be individually modeled in fine-grained forecasting, concluding in an equal number of subsequences and models.Regardless of how many subsequences, integrated prediction treats subsequences as a whole and only one model will be created.The data division of the fine-grained and integrated prediction is shown in Figure 3.

Fig 3 Data segmentation for fine-grained and fusion forecasts
The error is handled in the same way.Assuming that a given day's data is connected to the data from the previous three days, the fine-grained prediction is finished, and the error from the previous three days is used to forecast the error from that day, as shown in Figure 4.

Fig 4 Demonstration of the error prediction technique
In this study, the number of pertinent days in the current sequence is estimated using ADF [37].Ljung-Box test [38] and ACF, and PCAF plots were used to determine whether or not there was autocorrelation in the sequences.The results show that the pvalue is less than 0.05, and the images show PACF second-order truncation and ACF tracking.This indicates a substantial autocorrelation in the sequence.The ACF and PACF are shown in Figure 5.
Data normalization is required to remove the impact of the scale between the indicators.The data are comparable since they are in the same interval after normalization, which also considerably accelerates convergence.In this study, we employ the most popular standardization technique, which involves normalizing the data by its maximum and minimum values using the equations below: x means the original data. is dimensionless, min x which means the minimum value of data, max x means the maximum value of data.The principle of sample entropy recombination is divided into two points.
(2) The recombinant sequence contains two or more IMFs, except for the most high frequency IMF.
The goal of DSE is to minimize the Euclidean distance of each recombination sequence.The specific steps of the DSE method are as follows : a) The IMFs are sorted according to the entropy value magnitude, and the point with the smallest entropy value is used as the vertex, and this vertex (IMF7) constitutes the deviation upper bound with the next point (IMF6) and the deviation lower bound with the second point after this vertex (IMF5), and then (IMF6..IMF2) is used as the vertex in turn, and a triangle is constructed for every three points, and the mean value of the Euclidean distance between each point in the triangle and the deviation upper bound and deviation and used as the deviation metric.An example of the construction is shown in Figure 6. 1.The subsequences should be sorted from largest to smallest entropy value.
2. Construct the upper and lower deviation bounds for each point and calculate the deviation metric.2.  The entropy value of the sequence before decomposition is indicated by Series.
The letters Or, Err, and Re stand for the carbon valence, error, and recombined sequences, respectively.The decomposition of the initial carbon price exhibits a high followed by a low, which is consistent with the high frequency to trend, as can be observed from the entropy values in Table 2. Additionally, the error's decomposition displays an inverted u shape, which is consistent with the VMD's central frequency decomposition.The major goals of CEEMDAN decomposition are noise reduction and feature enhancement.Reducing complexity and non-smoothness is the primary goal of VMD decomposition.In general among the prediction results of CEEMDAN decomposition, IMF0 and the last IMF have the worst prediction results, and the errors are usually generated by these two IMFs.Therefore the integrated prediction of the error can be considered as a overall analysis of the carbon price, and it can be seen from c of Figure 7 that the error 800 data show a smooth then decreasing and finally increasing coinciding with the carbon price data.In Section 4, the specific experimental proofs will be displayed.

Accuracy assessment
In this study, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are used to evaluate the point prediction model accuracy.The calculation's formula is as follows: The interval is set to 0~1 and the maximum-minimum normalization method is employed.Due to the small values of some of the components, values around 0 may occur following component prediction, which will cause MAPE display results to be inaccurate.As a result, we will include a comment at the base of the table in this paper whenever we use the normalized MAPE.

Results and discussions
The single model and the most fundamental decomposition prediction model are introduced in Section 4.1.1.DSE will be used in section 4.1.2to restructure the sample entropy and determine the impact of different restructuring on the prediction outcomes.The feasibility of error correction and the proposed error correction in this study will be covered in section 4. 1.3.The connection between integrated and finegrained prediction will be explained in section 4.1.4.In section 4.1.5,the hierarchical model will be compared with the secondary decomposition model.Additional carbon valence data are subjected to experimental analysis in Section 4.2.

Basic decomposition framework analysis
In this paper, we use LSTM, GRU, DNN, BiLSTM, and BPNN to conduct tests on a single model.There are three layers for LSTM, GRU, BiLSTM, and DNN use three-layer structure with tanh as the activation function, there are 64 hidden units for BPNN.The results of the comparison are shown in Table 3.  3, we can see that LSTM has the best prediction results, followed by BILSTM and GRU.Therefore, GRU, LSTM and BiLSTM are mainly used as comparison models in the following model selection.

Table 4
Results of each component IMFs normalized to 0-1 result display Table 4 shows the evaluation results of the decomposed components of CEEMDAN, and from the results the decomposed accuracy is higher than the undecomposed accuracy.From the local perspective, the prediction trends of the three models are basically the same, and IMF3-IMF6 perform well with MAPE all below 0.8, and the worst predictions are made for IMF0 and IMF7.IMF0 fluctuates more and shows underfitting, but because of the small value of IMF0, IMF0 determines the detailed performance of the results, and the lag is the main reason for IMF7.Taking LSTM as an example, Figure 8 shows the prediction results for each component.

. DSE restructuring
Each decomposed component must be modeled to predict it, which has a significant impact on the run's efficiency.Runtime is cut with the DSE fusion technique.Table 5 displays the outcomes of the comparison.

Table 5
The impact of fusion on results As can be seen from Table 5, the combination of DSE can save about 2-3 times of computing time.However, the reduction in the number of features inevitably leads to a decrease in accuracy, but the accuracy loss is within an acceptable range considering the huge change in time.

Table 7
Deviation SE fusion process Fusion Sequence 0 1 2 3 Usually the entropy value of the components decreases gradually, indicating that the complexity of the components is also decreasing gradually, but for the sake of insurance, the components are arranged in the order of entropy value from largest to smallest, and Table 6 shows the deviation metric values of each segment.
The fusion of sequences from to j into a new sequence is indicated by the notation (i, j).The following are the specific steps for calculating DSE: First, the initial fusion sequences IMF6 and IMF7 are identified.Next, IMF5 is added to the fusion sequence.Finally, the deviation of (5, 7) is calculated.This study compares and experiments with the three types of no-error correction, general error correction, and error correction suggested in this study.The comparison results are displayed in Table 9.
The forecast after VMD decomposition increases the prediction's accuracy, as shown by a comparison of the prediction with and without error correction.The error correction described in this study, known as PEC, can be seen to have some improvement effects.EC stands for standard error correction.Two different prediction strategies are used in the overall layer.By fine-grained the prediction for each component, the variability between components is found and the interpretability of the components is enhanced.By integrating the predictions, all the components are considered as a whole and correlations between the components are found.The following aspects will show the range of applications for fine-grained and integrated forecasting models.From the comparison results, it can be seen that the hierarchical strategy is better than direct secondary decomposition for both LSTM and GRU, with a 20%-30% improvement in MAPE compared.In the comparison between LSTM and GRU, it can be seen that GRU is more effective for integrated prediction.

Analysis of other data
The carbon price data of Beijing and Tianjin will be compared in this study to illustrate the benefits of DSE as well as the framework.The sample entropy value for Beijing is displayed in Fig. 10 (a), and the decomposed eight subsequences are reorganized into four sequences of (0) (1, 2, 3) (4, 5) (6, 7).Deviation of each segment (0, 2, 1) (

Conclusion
This study proposes A novel hierarchical carbon price forecasting model with local and overall perspectives.By contrasting the models of various carbon trading markets, it is possible to demonstrate the stability and superiority of the approach and model.The main conclusions are summarized as follows: (1) The outcomes of different fusion techniques vary significantly, and DSE offers a fusion technique that reduces the impact of subjective factors.
(2) The combination of the two perspectives improves the accuracy of the framework by examining the components from two perspectives: the local perspective for initial forecasts and the overall perspective for revised forecasts.

1 th
− indicates the hidden state of the previous time-step.1 t c − indicates the state of the cell at the previous time-step.t h indicates the hidden status of the current time-step.

Figure 1 Fig 1
Figure 1 depicts the framework's flow in this paper.The four main steps of a novel hierarchical carbon price forecasting model with local and overall perspectives proposed in this paper are as follows: a) The feature data are pre-processed in the first step.By using CEEMDAN, the original data are primarily divided into several subsequences with distinct temporal properties.Then these subsequences are reorganized by the deviation sample entropy fusion method (DSE) to produce new subsequences.

Fig 6
Fig 6 An illustration of a minimum deviation triangle b) Accessing the two IMFs with the lowest entropy value from the unvisited IMFs, marking them as visited and as a reorganization sequence.c) ASelect the one with the lowest entropy value among the unvisited IMFs, incorporate it into the recombination sequence and calculate the deviation of this recombination sequence.d) Determine whether the deviation of the recombinant sequence is less than the minimum deviation metric, if less than, mark the newly incorporated IMFs as visited and return to step c.If greater than, separate the newly incorporated IMFs, recombinant sequence construction is completed, and delete the deviation metric containing the recombinant sequence points and return to step b.Begin Input:(

3 . 4 . 4 . 1 . 4 . 2 . 4 . 3 . 4 . 3 . 1 .End 3 . 2 . 3 .
The two IMFs with the lowest entropy values are accessed as the first recombination sequence.While number of unvisited points>0: The point with the lowest entropy value from the unvisited IMFs is incorporated into the recombination sequence.Calculating the deviation of recombinant sequences.If the deviation of this recombination sequence > the smallest deviation metric: Separation of newly incorporated IMFs to form a new reorganization sequence.4.3.2.Remove the deviation metric that contains the recombination sequence points.4.3.Else: 4.3.1.Mark newly incorporated IMFs as visited.Variational mode decomposition for error sequence In addition to being fundamentally distinct from EMD and CEEMDAN in theory, VMD also differs from the outcomes on their own, as can be observed in the two decomposition curves' non-curve-like decomposition curves.Experiments show that VMD is particularly suited for dealing with a challenging sequence-like error and can efficiently decompose high-complexity, low-smooth sequences.The pairwise half prediction method is used to handle the error model, with the first half serving as the training set and the second half serving as the error-building set.Then it is subjected to a VMD decomposition.The error is decomposed into 8 subsequences by VMD, which uses integrated prediction to extract more valuable features while drastically reducing run time.The error decomposition curves and entropy values are displayed in Figures 8 c and d, as well as in Table

Fig 7
Fig 7 Decomposition, recombination, and their entropy values

Fig 8
Fig 8 Following CEEMDAN decomposition, the LSTM prediction graph 4.1.2.DSE restructuring . The recombination is successful since the deviation is the least in the minimal deviation triangle comparison table.Repetition of the aforementioned stages proceeds to the merging of IMF4 and the calculation of the deviation degree of (4, 7).(4,7) D =0.09 > (4,6) D =0.07.The recombination fails because the deviation degree does not meet the requirement.IMF5, IMF6, and IMF7 were the first fusion sequence as a result.The next fusion sequence is created by combining IMF3 and IMF4, after which IMF2 is merged in and the deviation degree of (2, 4) is determined.(2,4) D =0.014 > (0,2) D =0.09.The recombination fails because the deviation degree does not meet the requirement.The final three fusion sequences are (IMF0, IMF1, IMF2) (IMF3, IMF4) (IMF5, IMF6, IMF7).

Fig 10
Fig 10 Beijing and Tianjin sample entropy sets of comparisons that are challenging for humans to integrate are made in the comparison of fusion methods.If the first four divisions in the Beijing entropy values are made based on the researchers' experience, they might be a little hazy.The first five sequences are the haziest in the Tianjin entropy value.Therefore, it is for these sequence divisions that the comparative experiment is mostly conducted.

( 3 )
The experiments comparing three carbon price markets show that the model proposed in this paper outperforms the traditional quadratic decomposition model.The model can be used in subsequent studies to tackle more challenging issues like oil prices, wind speed, etc.The model only takes into account single-step forecasts; however, in the future, interval forecasts and multi-step forecasts may be taken into account, as well as how other factors (such as carbon emissions, etc.) may affect such forecasts.the model to perform the callbacks Operations.Except for the trend series, the rest of the series are 100 Parameters are used in the model of this paper.

Table 1
Main digital features of datasets.
layers are divided for modeling, and their data are split in two different ways.The local layer uses the first 1600 data as the training set and the last 100 data as the test set.The global layer uses the first 800 as the training set, 800-1600 as the validation set, and the last 100 data as the test set.Figure2shows the summary in the form of a graph.The data characteristics of each group are shown in Table1.

Table 1 ,
respectively.The principle of SE reorganization is to combine the parts with similar sample entropy.However, there is no clear indicator proposed for what kind of sample entropy is similar, which depends on the experience of scholars.In this paper, we propose a new sample entropy fusion method to avoid the influence caused by subjective factors.

Table 2
Variation in the entropy of each series

Table 3
Comparison of a single model

Table 8
It is unacceptable to have trouble accurately forecasting the details since the decomposed high-frequency series shows variations in the details of the carbon price data.A secondary decomposition of the high-frequency components is the typical method, however, this overlooks the relationships between the components.Because high frequencies are challenging to forecast, this work uses a layered approach to combine error and VMD in the second layer and integrated them to get the results.This approach not only addresses this issue but also strengthens the bond between the components.The method for generating error series is also changed; Group 1's half of the training set is utilized as the training set, the other half as the test set, and the prediction outcomes produced by this modeling are used as the actual training set of the error series.It is more relevant to do this when considering prediction-related error.

Table 10
The waveforms of the decomposed components are nearly identical if the same decomposition technique is applied to the error sequence again, and the unpredictable portion continues to be unpredictable.Similar to CEEMDAN, decomposition algorithms like EMD are unable to break down the remaining decomposed components in an efficient manner.VMD can decrease the time series with high complexity and strong nonlinearity, match the error series precisely, and handle this problem flawlessly.

Table 11
Influence of decomposition algorithm on model

Table 15
The anticipated outcomes are greatly influenced by the fusion method, and the DSE fusion approach suggested in this study has the advantage of minimizing human judgment error.Using the entropy value of Beijing as an example, other fusion methods result in a larger error due to the huge volatility of Beijing's series.By contrast, using the method suggested in this research, MAPE can reach 1.362, whereas all other fusion methods result in a larger error.