Research and application of short-term load forecasting based on CEEMDAN-LSTM modeling

: Accurate short-term load forecasts play an important role in guiding and regulating the operations of electric utilities.Using long-and short-term memory neural networks with an improved whale optimization technique, this study suggests a combination strategy for long-and short-term neural network prediction.The long-short-term memory neural network mitigates the issue of gradient vanishing and explosion, caused by the cumulative multiplication of the activation function of RNN when handling lengthy sequences.In order to address the issue of model parameter randomness, the Whale Optimization Algorithm (WOA) is introduced. The improved Whale Optimization Algorithm (IWOA) is then obtained by using the roulette method to alter the individual whale population's optimization methods in order to avoid falling into the local optimum.In this paper, the adaptive noise-complete ensemble empirical modal decomposition method CEEMDAN is introduced to solve the problem of model training efficiency caused by the nonlinear non-smoothness of the model input data, so as to construct a combined CEEMDAN-IWOA-LSTM prediction model based on CEEMDAN-IWOA-LSTM.The results show that the model's prediction accuracy reaches 99.05%, and various prediction and assessment indexes are better than other prediction models, with the best performance and effect.


Introduction
Load forecasting is the systematic analysis of past load data and the accurate prediction of future load data while accounting for system operational characteristics, capacity expansion choices, environmental factors, and societal influences.Short-term load forecasting is becoming more and more important as it can be used to optimize the configuration of generating units, provide data support for the utilization and consumption of renewable energy sources, and optimize the operating environment of operators, among other advantages.
Traditional methods for forecasting, contemporary forecasting methods, and combined forecasting models are the three categories into which short-term power load forecasting techniques may be divided.The traditional methods of forecasting can be categorized into regression forecasting, exponential smoothing, and time series forecasting.Traditional load forecasting methods are stable and reliable, and can analyze and forecast based on historical data.However, they have limited predictive ability for complex, non-linear systems and perform poorly when dealing with unexpected events and data changes.While traditional load forecasting methods are still widely used today in many industries and sectors, with advances in artificial intelligence and data science, more and more organisations and institutions are experimenting with new technologies and methods to improve the accuracy and efficiency of load forecasting.Modern prediction methods are divided into gray prediction methods, fuzzy prediction methods, and artificial neural network methods.Gray prediction method is applied to the load by exponential growth trend, the prediction accuracy is high.When the model has a large gray scale, the prediction is not satisfactory enough.Cevik, HH, et al. presented a methodology for anticipating short-term load based on neuro-fuzzy inference and fuzzy reasoning [3].Commonly used control algorithms in the field of electricity loads are fuzzy control algorithms based on fuzzy logic and fuzzy mathematics in fuzzy theory.The most popular prediction techniques are neural networks (NNs), which have powerful nonlinear mapping and self-learning abilities.Numerous studies have suggested different neural network structures to enhance prediction accuracy [7], and many professionals and academics have suggested deep neural networks to keep up with the rapid advancement of artificial intelligence.Deep neural networks (DNNs) [4] feature more hidden layers than regular neural networks,which increases the sensitivity to temporal data correlation and, to some extent, enhances data characterization and information mining capabilities.Common deep neural networks include convolutional neural networks (CNNs), deep confidence networks (DNNs), and recurrent neural networks (RNNs).LSTM was first created by Hochreiter, S., and was derived from RNN [16].It tackles the gradient vanishing and explosion problems that RNNs typically encounter [9], and it can maintain both short-term and long-term memories in the network [11].LSTM has also been utilized effectively in a variety of other research disciplines [5], including phoneme classification, traffic prediction [19], language captioning [6],and action identification [18].The LSTM model has attained great accuracy in the aforementioned research domains and is a very efficient neural network model that can efficiently learn the regularity information in the historical sequence information.
Individual neural network prediction algorithms are inefficiently trained and tend to fall into local optima during training, and the resulting models have poor prediction accuracy.A combination of forecast models is proposed to address the issue of forecast accuracy, and the use of several different artificial neural network model combinations to address the issue of short-term power load forecasting is a current area of research hotspot.According to Santra,as et al., a combination model of a genetic algorithm (GA) and long-term and short-term memory (LSTM) has been presented [17].The parameters of the LSTM are optimized using genetic algorithms to increase the robustness of short-term load predictions.Hong, Y. et al. suggested a model for forecasting short-term loads based on deep neural networks using iterative ResBlock to find the relationships between different power consumption behaviors [8].An improved model with good prediction outcomes was proposed by Moradzadeh, an et al. [13] by combining Support Vector Regression (SVR) with Long and Short Term Memory (LSTM).Support vector regression, however, cannot be used with big data sets since it performs poorly when the number of features per data point is greater than the number of samples in the training data and because noisy datasets can easily result in overlapping target classes.A combined prediction model based on short-and long-term memory neural networks and variational modal decomposition was proposed by He, FF [10].This model performs variational modal decomposition on the original input signal to lessen noise interference.With the goal to improve the performance of LSTM neural networks, Meng, Zr proposed a short-and long-term memory neural networks model [14] combining empirical modal decomposition and attentional mechanisms.A sparrow algorithm-optimized LSTM prediction model was proposed by Liao, GC [12].This model uses automatic search for the best parameters during training rather than human experience selection, and it uses a swarm intelligence optimization method to maximize parameter selection.This significantly improves the LSTM model's training effectiveness and prediction precision.
Considering the difficulties of forecasting and the existing forecasting techniques, this paper proposes a combined LSTM forecasting model optimized based on CEEMDAN (adaptive noise-complete ensemble empirical modal decomposition) and IWOA (Improved Whale Algorithm) for short-term power load forecasting.To create a combined model with great prediction accuracy, the LSTM neural network's parameters are optimized using the IWOA method.To train multiple IWOA-LSTM models, load data is divided into modal components with various characteristics using CEEMDAN.,and the prediction results of the models corresponding to the various modal components are linearly superimposed to obtain the final prediction values.The validity and accuracy of the proposed combined model are then verified by comparing the prediction effects of the various models.
The rest of the paper is summarized below.The improved whale optimization technique, CEEMDAN, and the LSTM neural network's fundamental concepts are introduced in the second section.The integrated prediction model and the error evaluation index are introduced in the third section.A pertinent example analysis is provided in the fourth section.Part V summarizes.

Short-and Long-term Memory Neural Networks(LSTM)
Compared to the traditional recurrent neural network, LSTM can well solve the gradient vanishing and gradient explosion problems by adding a unit state that allows the sequence information to be passed from the beginning to the end of the sequence to the hidden layer of the recurrent neural network to ensure that the sequence information exists for a long time.LSTM incorporates the transmission of memory information from the beginning to the end in the implicit layer of the RNN model .The interaction of the "gate" structure modifies the memory information and determines the memory information to be passed on down the line.Figure 1 depicts the LSTM network's fundamental structure.1 labels the input data at time t as x t ; the output of the hidden layer at time t is denoted by h t ; and the state of the cellular unit at time t is denoted by c t ;σdenotes the sigmoid function;tanh denotes the tanh function;The amount of information permitted to be passed as well as the percentage of information passed are both indicated by an output value of 0 or 1.The h t−1 , c t−1 , and x t of the preceding moment determine the h t and c t of the LSTM previous instant.
(1)The "Oblivion Gate," the first layer of the LSTM's interaction structure, links the outputs of the previous instant to the inputs of the present moment and determines whether the current input information is saved and passed on according to T f .The output T f is 0, then the cell state c t−1 of the previous moment is chosen to be forgotten;If T f is 1, the selection is reserved.The T f equation is shown in equation (1): (2)The "input gate" is the second interaction layer, which has sigmoid and tanh functions that determine the content of the input message;T i determines what information is stored in the candidate values computed by the tanh function.The tanh layer is used to calculate possible values to be added to the cell state, mapping the input x t and the preceding moment's information ℎ t−1 in the implicit layer between -1 and 1.The equations are shown in (2) to (3): After the first two layers,c t−1 , which carries the memorized information, combines with c ̃t, which carries the new information, as shown in Eq. ( 4), and T f determines the information forgotten in c t−1 .T i determines what information is retained and added in c ̃t.
(3)The "output gate" is the third interaction layer; it uses equation (5) to determine the output value.
Eqs. (1) through (5) where W f , W i , W c , and W o are weight vectors;b f , b i , b c , and b o are offsets.

An Improved Whale Optimization Algorithm (IWOA)
Australian researchers Mirjalili and Lewis presented the whale optimization method in 2016 [15].It is a meta-heuristic optimization system that simulates humpback whale bubble net feeding.A clever optimization approach with simple operation, few parameters, and effective optimization, the whale algorithm, but it has a tendency to easily fall into local optimums or stray from the intended course of action when doing optimization searches.This subsection describes the improved whale optimization algorithm.The whale optimization algorithm works in the manner described below: (1)Initialize the population P and set the basic parameters.
(2)Derive the optimal whale particle in the whale population and the corresponding fitness function value based on the fitness function.
(3)Obtain whale subpopulation S by probabilistically selecting the whale position update formula, calculate the corresponding fitness values, and update the population P accordingly.
(4)According to the convergence condition, repeat step (3) until the convergence condition requirement is satisfied, i.e., obtain the algorithm.The optimal solution of the algorithm is obtained.
Figure 2 displays the algorithm's flowchart.The search particle in the search space is initialized in WOA.When A<1 or A>1, WOA launches a localized search or a global search, respectively, and the equations are shown in ( 6)~ (8).
Where the total number of iterations allowed is denoted by T, and the number of iterations that have already been completed is given by t,r ∈ [0,1].During this process, starting at 2,a continues to decrease and eventually decreases to 0,A ∈ [−a, a].
(1)Localized search phase In this phase, the other particles approach the optimal particle while updating the position information, which is modeled by both shrink-wrapping and spiral updating methods, and X * is the best This stage sees a progressive decline in a.As a result, A's range likewise shrinks, and the search particle X can now arrive at any place.Eqs. ( 9) through (10) indicate the position of the search particle after updating: The distance between the search particle and the target is denoted by the letter D ⃗⃗ , which stands for the random distance.
2)Spiral renewal phase The position is determined by the spiral equation after updating the search particles, as shown in ( 11) to ( 12): D ′ ⃗⃗⃗⃗⃗⃗ is the separation between the ideal solution and the search particle, where b is a constant factor,l is a random number between [-1, 1].Whales combine two methods of food capture when hunting, and a probability p is introduced to identify which method the whale employs at the moment.The formula is shown in (13): Where p is a randomly chosen probability from [0, 1], and Table 1 illustrates the connection between p's value and the location update method: (2)Global search phase In this phase of the population, a randomly selected search particle is updated， and other particles move away from this particle to perform a global search.Equations ( 14) to ( 15) establish the mathematical model.
Where X rand ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ stands for a search particle chosen at random from the population.
First, the population is randomly initialized;if the optimal particle changes during the iteration process,X * is updated until the termination condition is met to end the iteration.
Due of the WOA's poor optimization performance, low precision, and propensity to easily enter a local optimum,hence numerous academics have suggested a variety of techniques to enhance the algorithm.Xu Hang et al. used a hybrid strategy to improve the whale algorithm, which improves the convergence accuracy by combining multiple learning strategies to improve the population's ability to jump out of the local optimal region.Bozorgi, S. M. et al. proposed to combine WOA with Differential Evolution [1] to improve function optimization through the good exploratory ability of Differential Evolution (DE).
In this research, when the population difference is modest, we utilize a mutation algorithm to improve population variety, and we use a roulette technique to choose the mutation object so that the algorithm breaks away from the local optimal solution.Up until a viable solution is discovered, the search for optimality is carried out in the vicinity of the optimal solution.For this purpose, the conditions under which the population undergoes mutation are introduced as shown in equation ( 16): Where N is the population's total number of members;α and β are variance constants;andFit i is the fitness score for the ith individual in the population;Fit max ,Fit min , and Fit avg are the population's Maximal,least, and mean values, respectively.Both of the conditions in Equation ( 16) can, to some extent, indicate the level of population aggregation, and if the difference between the two is large, it can imply a high level of population aggregation.As the number of iterations increases in this scenario, the optimum solution of the algorithm won't vary considerably.In order to make the algorithm continues searching for an optimal solution close to the ideal solution until it finds one that meets the criteria to deviate from the local optimum.The conditions for introducing population variation are shown in Equation (17).
Where X old is the individual chosen by the roulette method to be subjected to the mutation process;X new is the altered individual; and p is a random number with a value range between [0,1].
In order to significantly enhance the performance of the LSTM model, the IWOA algorithm may be used because it offers a powerful global search capacity.The position of each individual is a hyperparameter of the LSTM neural network, and IWOA directly maps each individual's position to the hyperparameters of the neural network.The LSTM neural network is then given the hyperparameters of the ideal individual determined by IWOA to implement network training, and its prediction process is represented in Figure 3.The fitness function of the IWOA algorithm in this study is the mean square error, or RMSE.Step 3:Randomly generate multiple populations to obtain the parameters of the corresponding LSTM network model.
Step 4:Use the mean square error as a fitness function for training by calculating the difference between the LSTM model's true and predicted values for each population parameter.The population with the smallest fitness is selected as the optimal solution obtained in one iteration, the global optimality test is performed, and the corresponding fitness value is calculated.
Step 5:Start the iteration and use the improved whale optimization algorithm for parameter optimization and update the network parameters based on the population parameters and fitness values obtained last time.
Step 6:Up to the predetermined number of iterations, repeat steps 4 and 5.
Step 7:Output the optimal population parameters obtained from the optimization search, corresponding to the four parameters of time step, learning rate, number of hidden layer neurons and number of iterations in the LSTM model.
Step 8: The obtained network parameters are used to construct an LSTM model to obtain the final predicted values.

Fully Adaptive Noise Ensemble Empirical Modal Decomposition(CEEMDAN)
Method of Empirical Modal Decomposition (EMD) is an adaptive method for signal analysis.It is appropriate for analyzing nonlinear and nonsmooth signal sequences with a high signal-to-noise ratio.A limited number of intrinsic mode functions (Intrinsic Mode Function, IMF) that contain localized feature signals can be broken from the sample data using EMD [2].
Instantaneous frequencies are used in conventional signal analysis to characterize local features of nonlinear signals.In order for each point of the instantaneous frequency of the eigenmode function IMF obtained from the decomposition to be meaningful, the following requirements must be needs by the IMF: Over-zero points and local extrema are identical in number or differ by one;the average of the local maxima's and minima's envelopes is zero;and each IMF component is then subjected to the Hilbert transform in order to obtain instantaneous frequencies with physical meaning.
The algorithmic flow of EMD is as follows: (1)The very large and very small values of the sample x(t) are concatenated into an envelope using the cubic spline method; (2)Find its mean value to get the uniform value envelope(math.)m 1 (t); (3)Find the difference between x(t) and m 1 (t) to get h 1 (t) as shown in equation ( 18): (4)If h 1 (t) does not fulfill the IMF condition.Let h 1 (t) be x(t).Go back to the first step and make a second selection as shown in equation ( 19): Repeat for k times as shown in equation (20): When h k (t) meets the IMF condition, i.e.,h k (t) is the first component of IMF, as shown in Equation (21): (5)x(t) is differenced from the first IMF component c 1 (t) to obtain the residual term as shown in equation ( 22): (6)The loop is executed n times to obtain the residuals r 2 (t), r 3 (t) ......r n (t), and then determine the monotonicity of the residual r n (t).If the function is monotonic or constant, then meeting one of the two conditions signifies that the decomposition is finished.A number of IMF components and a trend term r n (t) are obtained, as shown in Equation (23): However, the EMD algorithm will have a more serious modal aliasing phenomenon, affected by the frequency of the original signal's characteristics, which will lead to the appearance of the same or similar feature signals in different modal components, so that the extraction of features is less effective.The difference between Ensemble Empirical Modal Decomposition (EEMD) and EMD is whether or not Gaussian white noise is added to the the original signal once more.EEMD is a technique for decomposing time series signals using EMD after adding Gaussian white noise [58].However, the EEMD method does not have the ability to completely eliminate the white noise after the re-decomposition is complete .
The adaptive noise-complete ensemble empirical modal decomposition method CEEMDAN, which decomposes and reconstructs the signal by adding white noise to the residuals order by order, was proposed by experts as a solution to these issues.Jia Yilun et al. proposed to combine CEEMDAN and support vector machines for power load forecasting [60], and achieved good prediction results by utilizing the CEEMDAN method to decompose the data as input to the support vector machine.A diagram of the CEEMDAN decomposition iteration process is shown in Figure 4.
CEEMDAN has better modal decomposition results and faster computation speeds compared to EMD and EEMD.The quantity of the low-frequency, small-amplitude IMF components that are indicated by the EEMD decomposition can be significantly decreased with CEEMDAN.The problem of large reconstruction errors in EEMD is solved while avoiding the phenomenon of modal aliasing.3 Key Findings

Data sources
The data source for this paper is the load statistics of a city's economic development zone, as shown in Figure 5.Historical load data was recorded for a full year from the beginning of 2021 to the end of 2021, and the dataset was sampled every four hours, six sampling points a day, for a total of 2,190 sampling points.The dataset records the total electrical load (MW) and meteorological data for the area, as shown in Figure 6, which shows the temperature data recorded in this dataset.
The loaded dataset is divided and 80% of all data is the training set, which is involved in model training.To validate the effects of model training and to improve the model, the remaining 20% of the data is used as a validation set.The test set, which is chosen at random from the load data that follows the dataset, is used to evaluate how well the model predicts the future.The test set data was selected from March 4 to March 14, 2022 for a total of 10 days of data with a total of 60 sample points.

Error assessment indicators
With the aid of "numerical values" or "percentages" in prediction error evaluation metrics, the performance of the prediction model is intuitively evaluated.The coefficient of determination (R2), the root mean square error (RMSE), and the mean absolute percentage error (MAPE) are all used in this study to evaluate how accurate the predictions were.The numerical accuracy of the model was evaluated using MAPE and RMSE, and the accuracy of the model curve trend was evaluated using R2.The accuracy of the model prediction increases with decreasing MAPE and RMSE values.R2 has a range of [0,1], and the more closely R2 approaches 1, the more accurate the model's prediction.The equations are shown in ( 22) to (24): (24)

Analysis of prediction accuracy of LSTM model and IWOA-LSTM model
In order to make the selection of LSTM model parameters more accurate, and further reduce the difficulty of model training and optimize the situation of falling into local optimal solutions.The LSTM neural network model's hyperparameter selection is optimized in this part using the Improved Whale Algorithm (IWOA) to increase the model's capacity for generalization and prediction accuracy.The improved whale optimization seeking process is shown in Table 2. Analyzing the output of Table 3, the IWOA parameter optimization process introduces a variance factor to make the model parameters jump out of the current optimal solution after finding the optimal parameters, and iterates again near the optimal solution.The new optimal solution is output if the fitness function value is less; else, the optimal solution from the previous iteration is output.
The best possible parameters are obtained, the final IWOA-LSTM model is built in its entirety, the prediction results are compared to the test set data, and table 2 displays the final findings of the experiment.A comparison of the predicted results and the LSTM model is shown in Figure 5.   4 shows the maximum relative error and average absolute error of the two prediction outcomes after computing the prediction error value and relative error of the two models.It is clear from the data that the prediction error has significantly decreased.Figure 6 compares the relative error percentage and prediction error for the two models, allowing the reader to see the dramatic improvement in the predictive power of IWOA.According to Table 4 and Figure 6, the LSTM model's largest relative error and average absolute error are 8.45% and 4.23%, respectively;4.57% and 1.88 % for the IWOA-LSTM model, respectively.The IWOA-LSTM model's accuracy is increased by 45.9% for the maximum relative error and 51.6% for the average absolute error.The error values calculated for each assessment indicator are shown in Table 5.

CEEMDAN-IWOA-LSTM Combined Prediction Models
In this section, the CEEMDAN method will be used to optimize the combination of the IWOA-LSTM model from the previous section in order to build a combined prediction model based on CEEMDAN-IWOA-LSTM.This will enable us to more effectively optimize the neural network model's input samples, which will enhance the efficiency of neural network training.
After validation in the preceding subsection, the IWOA-LSTM model is shown to have extremely high efficiency and small error in short-term power load forecasting; however, the nonlinearity and nonsmoothness of the input time-series data significantly affect the model's training efficiency and accuracy.Consequently, prior to making a prediction, the original data are decomposed using the CEEMDAN method to produce a set of modal volumes as inputs to the prediction model.It can be good to use the data with uniform features as inputs to increase the prediction model's training effectiveness and yield better prediction outcomes.
In this section, a combined CEEMDAN-IWOA-LSTM based forecasting model is designed to forecast the load data.It is shown in Fig. 7.The basic flow of the combined short-term load forecasting models on the basis of CEEMDAN-IWOA-LSTM is as follows: Step 1: The dataset that has been preprocessed as input to CEEMDAN is decomposed to obtain multiple modal component IMF sequences.
Step 2: The IMF components obtained from the decomposition are used as inputs to the IWOA-LSTM model for training, and the optimal hyperparameters of different component models are obtained.The corresponding IWOA-LSTM model for each component is obtained.
Step 3: To obtain the prediction results for each IMF component, the various IMF component data are predicted using the model.By using linear superposition, the final prediction results are obtained.

Algorithm decomposition results and analysis
Parameterization of the CEEMDAN decomposition.It is customary to select the Gaussian white noise standard deviation between 0 and 1, the noise addition frequency between 50 and 100, and an associated upper bound on the maximum number of repetitions.Following a number of decomposition tests, the CEEMDAN method's parameters are selected in this study as follows:The largest number of iterations is set at 500, the largest number of noise additions is set at 100, and the standard deviation of the Gaussian white noise is set at 0.2.Data decomposition of load data that has been preprocessed is performed using EMD and CEEMDAN, respectively.A plot of the decomposed modal component data is shown in Figure 8.As can be seen in Figure 8, the EMD and CEEMDAN decompositions produce seven IMF components and one residual component.The IMF1~IMF3 components decomposed by the two decomposition algorithms have the three highest frequencies among all the components, and their average amplitudes are small, reflecting a certain pattern of change within the load;The IMF4-IMF6 components in EMD and in CEEMDAN have good periodicity, reflecting localized load fluctuations;IMF7 and residual components in EMD, the average amplitude of IMF7 and residual components in CEEMDAN is larger and the trend is smoother, these components reflect the trend change of the data series, especially the residual term reflects the global trend of the load it can be noted that the EMD decomposition results show modal mixing, which affects the accuracy of the model if it is used for power load forecasting.In contrast, CEEMDAN shows no modal aliasing and the components have the localized characteristics of the original load sequence.
The decomposition findings show that CEEEDAN's decomposition effect is superior to that of EMD because it avoids the modal aliasing problem that plagues EMD and significantly enhances the decomposition of data sequences.By comparing the decomposition results, CEEMDAN is more suitable for the decomposition of nonlinear and nonsmooth power load signals.

Analysis of modeling examples
The CEEMDAN algorithm is added to the IWOA-LSTM model to form a new combinatorial model, based on the design and implementation in the previous subsection, this section compares the prediction results of the combinatorial model of the IWOA-LSTM model with the addition of EMD and CEEMDAN respectively.
Load data for the next two months were predicted by constructing combined EMD-IWOA-LSTM and CEEMDAN-IWOA-LSTM forecasting models.The two models output optimal parameters as shown in Table 6 and Table 7.As can be seen in Figure 10, the CEEMDAN-IWOA-LSTM model predicts a higher curve fit of the resultant curves compared to the EMD-IWOA-LSTM model, with a greater number of extreme points that are similar to the true value.To see how well each model predicted, the prediction error and relative prediction error of the two models were computed.Table 9 compares the errors made by the two models, and Figure 11 displays the comparison graph.Table 9 shows that the largest relative error and average absolute error percentages for the EMD-IWOA-LSTM model are, respectively, 2.96% and 1.25%;The CEEMDAN-IWOA-LSTM model is 2.43% and 0.85%, respectively, and the CEEMDAN-IWOA-LSTM model has higher prediction efficiency and accuracy.The results are reported in Table 10 after the data were assessed using each error assessment metric.According to Table 10,the CEEMDAN-IWOA-LSTM model has a higher prediction accuracy than the EMD-IWOA-LSTM model.The model's MAPE metric became smaller, decreasing by 35.3%, and the RMSE metric decreased by 31.3%;TheCEEMDAN-IWOA-LSTM model predicts findings more accurately than the EMD-IWOA-LSTM model in every index, yielding the best combined prediction model for this study.

Comparison of experimental results
By analyzing the experimental prediction results of several prediction models mentioned above, we obtained the values of prediction results and error indicator values of several models.The CEEMDAN-IWOA-LSTM model will be used in this section to compare the prediction results with the single LSTM model, the IWOA-LSTM model, and the EMD-IWOA-LSTM model.Figure 12 compares the four models' prediction curves, while Figure 13 compares the four models' prediction errors.11 shows a comparison of the various error evaluation metrics.As can be seen from the results of the comparisons above, the CEEMDAN-IWOA-LSTM model has a prediction accuracy of 99.05%.Considerable improvement in prediction accuracy was achieved compared to the other three models.The curve fit of the combined model is even better based on the degree to which the R2 value converges to one.The proposed CEEMDAN-IWOA-LSTM combined forecasting model in this paper can excellently forecast the electrical load data.

Figure
Figure 1 labels the input data at time t as x t ; the output of the hidden layer at time t is denoted by h t ; and the state of the cellular unit at time t is denoted by c t ;σdenotes the sigmoid function;tanh denotes the tanh function;The amount of information permitted to be passed as well as the percentage of information passed are both indicated by an output value of 0 or 1.The h t−1 , c t−1 , and x t of the preceding moment determine the h t and c t of the LSTM previous instant.(1)The"Oblivion Gate," the first layer of the LSTM's interaction structure, links the outputs of the previous instant to the inputs of the present moment and determines whether the current input information

Figure 2 :
Figure 2: Algorithm flowchart for the whale optimization

Commencement
Initialize the LSTM network parameters and determine the number of neurons in the hidden layerConstruction of the WOA model: initialization parameters, including the number of populations N, the spiral shape constant b, and the maximum number of iterations M, for mapping whale positions to LSTM Mutation?The singled out individuals were mutated according to equation (2.17) to obtain a new populationThe fitness value meets the requirements or reaches the maximum number of cycles Using the global optimum as an initial value for the LSTM

Fig 5 :Figure 5
Fig 5: Comparison of the two model prediction result curves

Figure 6 :
Figure 6: Percentage of prediction error and relative error for both models

Fig 9 :
Fig 9: Plot of the predicted results of the two models

Fig 10 :
Fig 10: Comparison of forecast results

Fig 11 :
Fig 11: Comparison of the prediction errors of the two models.

Fig 12 :Fig 13 :
Fig 12: Comparison of the four model prediction curves

Table 1 :
Correspondence Between P And Renewal Methods

Table 4 :
Evaluation of the Two Models' Prediction Error Values in Comparison

Table 5 :
Comparison of Indicators for Model Prediction Assessment

Table 5
shows that the IWO-LSTM prediction model put forward in this section enhances prediction precision and accuracy, with the MAPE accuracy of the IWOA-LSTM model increasing by 49.7% in comparison to the LSTM model,the accuracy of the RMSE improving by 48.4%, and the value of the R2 been closer to one.Therefore, the IWOA-LSTM model has higher prediction accuracy and good prediction results;After adding the improved whale algorithm, the parameter selection of the neural network prediction model no longer relies on experience, and a more accurate and efficient combined IWOA-LSTM prediction model is obtained.

Table 8
displays the prediction results from the two models, Fig.9displays the curves of the prediction results, and Figure10displays the comparison graph.

TABLE 8 :
Comparison of predicted outcome values of the two models

Table 9 :
Evaluation of the Two Models' Prediction Error Values in Comparison

Table 10 :
Indicators for the assessment of forecast errors

Table 11 :
Indicators for the assessment of forecast errors