A Combined Model Based on Society Cognitive Optimization Algorithm for Wind Speed Forecasting

: Wind energy, as renewable energy, has drawn the attention of society. The use of wind power generation can reduce the pollution to the environment and solve the problem of power shortage in offshore islands, grassland, pastoral areas, mountain areas, and highlands. Wind speed forecasting plays a significant role in wind farms. It can improve economic and social benefits and make an operation schedule for wind turbines in large wind farms. At present, researchers have proposed a variety of methods for wind speed forecasting; artificial neural network (ANN) is one of the most commonly used methods. This paper proposes a combined model based on the existing artificial neural network algorithms for wind speed forecasting at different heights. We first use the wavelet threshold method to the original wind speed data set for noise reduction. After that, the three artificial neural networks, extreme learning machine (ELM), Elman neural network, and Long Short-Term Memory neural network (LSTM), are applied for wind speed forecasting. In addition, variance reciprocal method and society cognitive optimization algorithm (SCO) are used to optimize the weight coefficients of the combined model. In order to evaluate the forecasting performance of the combined model, we select wind speed data at three heights (20m, 50m, and 80m) in National Wind Technology Center M2 Tower. The experimental results show that the forecasting performance of the combined model is better than the single model, and it has a good forecasting performance for the wind speed at different heights. proposes a combined forecasting model ELM-Elman-LSTM, and the society cognitive optimization algo-422 rithm (SCO) is used for the optimization of weight coefficients. In addition, in order to obtain better forecasting results, the original wind speed data is de-noised by the wavelet threshold method. The wind speed data at three different heights are chosen to test the model. The experimental results show that the ELM-Elman-LSTM can improve forecasting accuracy compared with a single forecasting model. Furthermore, the ELM-Elman-LSTM has better forecasting perfor-426 for wind speed at different heights.


22
Wind energy, an essential renewable green energy, has large reserves and wide distribution, and has been widely 23 used in many fields. At present, wind power generation is the emphasis of wind energy utilization. The latest released 24 wind power data by the World Wind Energy Association revealed that the worldwide wind capacity has approximately 25 reached 744 Gigawatts in 2020, wherein an unprecedented 93 Gigawatts was added [1]. There are many advantages of 26 wind power generation. It can reduce the pressure brought by the shortage of traditional energy, and make a significant 27 contribution to local life and development. Wind speed is the direct incarnate of wind energy, and accurate wind speed 28 forecasting has a great significance. Therefore, it is the current research priority. 29 However, there is some trouble obtaining a high accuracy of wind speed forecasting results due to the random 30 fluctuation of wind speed data caused by the weather factors. Researchers have proposed a variety of methods for wind 31 speed forecasting, including statistical methods, physical method, ANN [2,3], and support vector machines (SVM) [4]. 32 Statistical forecasting method is a method based on actual historical data, theoretical knowledge, and mathematical 33 model to make quantitative forecasting about the development of things, mainly including the trend extrapolation 34 method, regression forecasting method [5], Delphi method [6], subjective probability method, exponential smoothing 35 (ES) method [7,8], autoregressive integrated moving average (ARIMA) [9], fuzzy system (FS) [10] and other methods. 36 Singh et al. [11] proposed a new Repeated wavelet transform (WT) based ARIMA (RWT-ARIMA) model, which has 37 improved accuracy for very short-term wind speed forecasting. Liu et al. [12] proposed a hybrid model based on em-38 pirical mode decomposition, novel recurrent neural networks and the ARIMA, in which ARIMA is employed to predict 39 the low frequency sub-sequences and one residual. 40 The artificial neural network is a commonly used method due to its particular strengths in regression and classifi-41 cation. At the time of forecasting, BP neural network and Elman neural network are frequently used algorithms. Altan 42 et al. [13] developed a new WSF model based on long short-term memory (LSTM) network and decomposition methods 43 with grey wolf optimizer (GWO). Zhang et al. [14] proposed a novel model based on VMD-WT and PCA-BP-RBF neural 44 network for short-term wind speed forecasting. Catalão et al. [15] used wavelet transform and an artificial neural net-45 work to forecast the wind speed data of Portugal. Later, with maturity technology, the researchers proposed the com-46 bination forecasting model and the hybrid forecasting model for wind speed forecasting. Many experimental results 47 showed that the combination forecasting and hybrid forecasting model could improve the forecasting accuracy and 48 stability. Determining the appropriate weight coefficients is the critical step to obtain better forecasting results. Com-49 pared with the method of using the algorithms to determine the weight coefficients directly, the method of using the 50 modern intelligent optimization algorithms like genetic algorithm (GA) [16] and particle swarm optimization algorithm 51 (PSO) [17] to further optimize the weight coefficients is more conducive to accurate results. Li et al. [18] proposed a 52 combination model based on variable weight for wind speed forecasting, which combined ARIMA, ENN, and BPNN. 53 Zhang et al. [19] proposed a combined model for short-term wind speed forecasting, which included flower pollination 54 algorithm based on chaotic local search (CLSFPA), five artificial neural networks, non-negative constraint theory 55 (NNCT), and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Liu et al. [20] com-56 bined a data pretreatment strategy, a modified multi-objective optimization algorithm, and several forecasting models 57 to forecast the wind speed in China. By combining a convolutional neural network and a long short-term memory 58 neural network, Chen et al.
[21] developed a multifactor spatio-temporal correlation model for wind speed forecasting. 59 Physical method [22] is based on precise mathematical and physical laws. Although the calculation is complex, some 60 research papers are still about the physical wind speed forecasting methods [23,24]. The precision of the physical 61 method is high, but the application of the physical method is not very common. 62 Wind speed at different heights is of great importance for a wind energy assessment of wind farms and the design 63 of wind-resistance coefficients for high-rise buildings. In this paper, a combined forecasting model ELM-Elman-LSTM 64 based on SCO algorithm is proposed for wind speed forecasting at different heights. Firstly, the original wind speed 65 data is de-noised by wavelet threshold algorithm, and then input data into the model for calculation. The calculation 66 results of each model are combined with the variance reciprocal method to obtain the intermediate forecasting results. 67 Finally, we use SCO algorithm to optimize the weight coefficient to get the final forecasting result. The social cognitive 68 optimization algorithm is an optimization algorithm based on the core of the social cognitive theory. We will introduce 69 it in detail in later chapters. In order to test the forecasting performance of the combined model and the forecasting 70 accuracy at different heights, this paper selects the wind speed data at the height of 20 meters, 50 meters, and 80 meters 71 at National Wind Technology Center M2 Tower. Experimental results show that the combined model ELM-Elman-72 LSTM can improve the forecasting accuracy and have good forecasting results for wind speed data at different heights. 73 The structure of this paper is as follows: The first chapter mainly introduces the background, significance, and 74 several methods of wind speed forecasting, such as artificial neural network, combined forecasting method, and hybrid 75 forecasting method. The second chapter introduces the theoretical knowledge of artificial neural networks involved in 76 the combination model proposed in this paper. The third chapter proposes the combination model and the evaluation 77 index of the model selected in this paper. The fourth chapter introduces the experimental data and data set division. In 78 the fifth chapter, we experiment and analyze the results of experiments. In order to illustrate the forecasting perfor-79 mance of the combined model, the combined model is compared with three single models. In the last chapter, we sum-80 marize the experiment. 81

82
There are diverse ANNs for wind speed forecasting, such as BP neural network, Radical Basis Function (RBF) 83 neural network, Generalized Regression Neural Network (GRNN). Every artificial neural network has certain superi-84 ority and limitation for wind speed forecasting. In this paper, we choose three different artificial neural networks that 85 are Extreme learning machine (ELM), Elman neural network and Long Short Term Memory networks (LSTM) respec-86 tively. The ultimate goal of ELM is to get the smallest , which is as follow: 106 ( ( , , )) = ( − ) ( − ) (5) 107 is the target value matrix of the sample set. According to the singular value decomposition [27] and Moore-108 Penrose generalized inverse [28,29], the weight matrix of the output layer is obtained. The connection weight between the hidden layer and the output layer is 1 , the weight between the input 118 layer and the correlation layer is 2 , and the weight between the hidden layer and the output layer is 3 . The 119 input and output of each layer are as follows: work. That is to say, and there is a full connection between the hidden layer and the hidden layer, which makes 136 the RNN can effectively deal with the related problems. The structure of RNN contains input layer, hidden layer, 137 and output layer, and the training algorithm of RNN is Back Propagation through Time (BPTT) algorithm. How-138 ever, there is a problem that BPTT algorithm cannot solve the problem of long-term dependence. Therefore, the 139 emergence of LSTM is to solve the long-term dependence. 140 The structure of LSTM is shown in Figure 1, as a special kind of RNN. The difference between the LSTM and 141 RNN is that LSTM introduces a processor Cell to judge whether the information is useful, and the core of LSTM 142 is Cell. There are three gates in the cell, namely input gate, forget gate and output gate respectively. The input 143 gate determines how much the input of the network at the current moment can reach the cell. The Output gate 144 controls the output of the cell. The forget gate determines whether the output of the previous state is fully retained, 145 partially retained, or completely forgotten in the current state. The training algorithm of LSTM also selects BPTT 146 algorithm. The learning steps of LSTM mainly include forward propagation and reverse propagation. Output gate: The final output: In the above formula, ( ) is the sigmoid function, and tanh( ) is hyperbolic function, the mathematical 160 expressions are as follows: Back propagation is a process of error propagation that includes two directions, one is the reverse error in 164 time, and the other is the reverse error in space. Besides, back propagation is a process of updating the weight 165 matrix and threshold of the input gate, the weight matrix and threshold of the output gate, the weight matrix and 166 threshold of the forgetting gate, and the weight matrix and threshold of the Cell. Because partial differential der-167 ivation is very complex in the back propagation, so it is not described in detail in this paper, and detailed content 168 can refer to literature [32]. 169 3. Proposed combined model 170 As the core of the combination model, there are many ways to determine the weight coefficients, such as the com-171 bination model with equal weights or unequal weights. This paper chooses the variance reciprocal method to calculate 172 the weight of the combined model ELM-Elman-LSTM, and then optimizes the weight coefficients using the society 173 cognitive optimization (SCO) algorithm. 174 3.1.  In this paper, we use the SCO-VRW method to calculate the weight coefficient of the combined model. That is to 176 say, we first use variance reciprocal weighting (VRW) to determine the weight coefficients of the combined model, and 177 then we use the SCO algorithm to optimize the weight coefficients. Variance reciprocal weighting is a more commonly 178 used method for determining weight coefficient, a model with a smaller sum of square error was given a high weight. 179 On the contrary, a model with a higher sum of square error was given a smaller weight. The calculation method is as 180 follow: At present, there are many bionic intelligent optimization algorithms used to optimize the weight coefficient. The 183 bionic intelligent optimization algorithm is an efficient approximation algorithm, such as ant colony optimization 184 (ACO), particle swarm optimization (PSO), and cuckoo search algorithm (CS). Most of these algorithms are based on 185 insect society. Literature [33] proposed a society cognitive optimization algorithm (SCO) based on social cognitive the-186 ory. SCO algorithm simulates social learning ability in social cognitive theory through competitive selection and domain 187 search. In the specific implementation of the algorithm, agents are used to representing people in society. The 188 knowledge in society is expressed by the knowledge base. Furthermore, the social learning process is simulated through 189 the interaction between agents and the knowledge library to achieve optimization. There are four significant concepts 190 SCO algorithm: Neighborhood search: It is assumed that there are two knowledge points that are 1 and 2 respectively, 196 the neighborhood search of 1 use 1 as the reference point and 2 as the center point to calculate a new knowledge 197 point variable 3 : Assuming that the number of knowledge points in the library is , the number of social cognitive agents is , 200 and iterative frequency is . The flow chart of SCO algorithm is shown in Figure 2, according to Figure 2, the basic steps 201 of SCO algorithm are as follows: 202 (1) Initialization. The knowledge points are randomly initialized, and the knowledge points in the library 203 are randomly assigned to each social cognitive agent. However, it is not allowed to assign a knowledge point to multiple 204 social cognitive agents repeatedly. 205 (2) Alternative learning process. Perform the following actions for each social cognitive agent. Updating library. There are new knowledge points increased in the process of alternative learning. 214 knowledge points with the worst fitness needs to be removed from the library in order to keep the number of 215 knowledge points in the library remain unchanged. 216 (4) Determine whether the result satisfies the end conditions. If not, repeating step 2) to step 4) until the end 217 condition is reached. 218 Figure 2 shows the flowchart of weight coefficient optimization, and the following points should be paid attention 219 to in the process of optimization. 220


In the initialization process of the library, the knowledge points in the library should meet the precondition that the 221 sum of the weight coefficients is 1. Otherwise, it must be reinitialized. 222 There will be a new knowledge point in the observational learning process, and new knowledge points need to be tested 223 to meet the precondition that the sum of the weight coefficient is 1.  226 The structure diagram of the combined ELM-Elman-LSTM is shown in Figure 3. According to Figure 3, the basic 227 forecasting steps using ELM-Elman-LSTM are as follows. 228 Step 1: Data de-noising. The wavelet threshold algorithm is used to de-noise the original wind speed data. 229 Step 2: Three single models are used individually to forecast the data after noise reduction.


Using the ELM to train the wind speed data after the noise reduction and get the forecasting result 1 .


Using the Elman to train the wind speed data after the noise reduction and get the forecasting result 2 . 232  Using the LSTM to train the wind speed data after the noise reduction and get the forecasting result 3

233
Step 3: Forecasting the wind speed data with the ELM-Elman-LSTM model.

234
 Using variance reciprocal weighting method to calculate the weight of the combined model.Using the SCO 235 algorithm to optimal weight, the final forecasting results are as follows.

251
In order to test the forecasting performance of the proposed model in this paper and the effect of wind speed 252 forecasting at different heights, we selected wind speed data at National Wind Technology Center M2 Tower. The data 253 are from April 10 to April 12 and April 16 in 2017 as experimental data, and they are Monday, Tuesday, Wednesday, 254 and Sunday. The experimental data includes three heights: wind speed data at heights of 20 meters, 50 meters, and 80 255 meters, respectively. The data were sampled every five minutes and collected from 0:00 to 23:59, so there are 288 obser-256 vations per day. The wind speed data of each height is shown in Figure 4. As can be seen from the figure, the general 257 trend of wind speed data at each height is the same, and the higher the height, the greater the wind speed. In addition, 258 comparing the four maps in the figure, we can see that the wind speed data selected in this paper is not periodic, and 259 the wind speed data varies from one day to another day. When wind speed data is measured and collected, noise is generated for various reasons. The original wind speed 262 data will be noise-reduced to improve the forecasting accuracy. Noise reduction methods used commonly are EMD, 263 EEMD, and PCA. In this paper, we choose the wavelet threshold method for data de-noising. We use soft threshold de-264 noising and hard threshold de-noising to process the same set of data. A better method with an excellent de-noising 265 effect is selected. Generally speaking, the effect of de-noising is evaluated by SNR and mean square error (MSE). The 266 higher the SNR, the better the de-noising effect, and the smaller the MSE, the better the effect. The results of noise 267 reduction are shown in Table 1. It can be seen from the table that the effect of hard threshold de-noising is better than 268 that of soft threshold de-noising.   Figure 5 shows the input and output of the ELM-Elman-LSSVM model. The data at the three heights has the same 272 input and output modes. For example, at 20 meters high, using data from time 1 to 5 to forecast the wind speed data at 273 time 6, using data from time 2 to 6 to forecast the wind speed data at time 7. In this way, the wind speed data of K to 274 K+4 is used to forecast the wind speed data at time K+5. Therefore, the input of the model is the wind speed data at time 275 K to time K+4, and the output is the wind speed data at the time of K+5. The input vector has a dimension of 5, and the 276 output vector has a dimension of 1. In other words, for the artificial neural network selected in this paper, the number 277 of nodes in the input layer is 5, and the number of nodes in the output layer is 1. In forecasting with single models, the evaluation indexes for the model we selected are MAE, RMSE, and MAPE. 284 The number of hidden layers and neurons in each hidden layer greatly impacts the forecasting results. For the ELM 285 network, there is only one hidden layer. For the data selected in this paper, compared with the 20 hidden layer neurons 286 network, the forecasting effect is better than the ELM network of 10 hidden layer neurons. The forecasting results of 287 ELM are shown in Figure 6. Table 2 shows the evaluation indexes of the ELM model. 288 First, by comparing the forecasting results at the same height of different datasets, we can get the following results. 289 For wind speed data at the height of 20 meters, ELM had the best forecasting effect on Tuesday with the minimum value 290 of three evaluation indexes, in which case the value of MSE is 0.1773, the value of MAE is 0.3475, and the value of MAPE 291 is 9.89%. For wind speed data at the height of 50 meters and 80 meters, there were the smallest values of MSE and MAE 292 on Tuesday, while the smallest value of MAPE on Sunday. For example, the value of MSE is 0.1914, the value of MAE 293 is 0.3387, and the value of MAPE is 7.03%. 294 Second, we compare the forecasting results at different heights in the same data set. It can be seen that the value of 295 MAPE decreases gradually as the height increases for all data sets. However, the changes are not regular. For example, 296 in the forecasting results on Tuesday, the MAPE is 9.89%, 7.68%, and 7.51% at the height of 20, 50, 80 meters. There is 297 no particular relationship between height and the value of MSE and MAE. For example, in the forecasting results on 298 Tuesday, the MSE and MAE are 0.1773, 0.3475 respectively, at the height of 20 meters. The MSE and MAE are 0.1914, 299 0.3387 respectively, at the height of 50 meters. At the same time, the MSE and MAE are 0.1603, 0.3222 respectively, at 300 the height of 80 meters. 301 In addition, the forecasting results at the height of 80 meters on Sunday are the best, in which the MAPE is 6.34%. 302  When using Elman neural network to forecast the wind speed, the same attention should be paid to the selection 305 of hidden layer nodes. The result is shown in Figure 7. 306 First, by comparing the forecasting results of the data at the same height of different datasets, we can get the fol-307 lowing results. For wind speed data at the height of 20 meters, the forecasting result of Elman has the smallest value of 308 MSE and MAPE on Tuesday. The value of MSE is 0.2540 and MAPE is 11.89%, while the value of MAE is 0.4179, which 309 is higher than that on Wednesday. For wind speed data at the height of 50 meters, there is the smallest value of three 310 evaluation indexes on Tuesday. The value of MSE is 0.1936, MAE is 0.3315, and the MAPE is 7.46%. For wind speed 311 data at the height of 80 meters, there is the smallest value of MSE and MAPE on Tuesday, while the value of MAE is 312 higher than that on Monday. 313 Second, by comparing the forecasting results of wind speed at different heights in the same data set. We can see 314 that there is no definite rule for the change of three evaluation indexes. For example, on Monday, the value of MSE and 315 MAPE decreased as the height increased, while the value of the MAE did not have the same regular. On Wednesday 316 and Sunday, the value of MSE and MAE increased as the height decreased, but the change of MAPE is not the same. In 317 addition, on Monday and Wednesday, the value of MAPE is decreased as the height increased. In short, when using 318 Elman neural network for wind speed forecasting at different heights, it is not sure that the value of MAPE will decrease 319 as the height increases. 320 In addition, the forecasting results at the height of 50 meters on Tuesday are the best, in which of the situation the 321 value of MAPE is 7.46%.  The wind speed data after the noise reduction are entered into the LSTM, and the forecasting results are shown in 325 Figure 8 and Table 4. According to the figure and table, we can get the following conclusions. 326 First of all, we compare the forecasting results at the same height of different data sets. It can be concluded that the 327 LSTM model has a better forecasting performance at the height of 20 meters on Tuesday, while the value of MAPE is 328 10.87% and MSE is 0.2336, the smallest value in all forecasting results. For a height of 50 meters and 80 meters, there is 329 the smallest value of MSE and MAE on Wednesday, but the value of MAPE is relatively large at this time. For example, 330 at the height of 50 meters, the value of MAPE is 16.26% on Wednesday, while the value of MAPE was 8.94% on Tuesday. 331 Secondly, comparing the forecasting results at different heights of the same dataset, it can be seen that the LSTM 332 does not have specific rules for wind speed forecasting at a different height from the three evaluation indexes. However, 333 from the perspective of MAPE alone, among the four-day forecasting results, except for Wednesday, there are three 334 days that the value of MAPE is the smallest at the height of 50 meters. That also shows that LSTM has some advantages 335 for wind speed forecasting at the height of 50 meters.

340
The forecasting results of ELM, Elman, and LSTM are combined with a certain weight using variance reciprocal 341 weighting method, and then the weight coefficient is optimized by the SCO algorithm. Figure 9 shows the forecasting 342 results of ELM-Elman-LSTM model. It can be seen from the figure that the forecasting curve of the combined model 343 ELM-Elman-LSTM is basically in line with the actual price curve, especially at the height of 50 meters and 80 meters on 344 Sunday, the forecasting curve is closer to the actual price curve. In comparison, the fit between the forecasting curve 345 and the actual price curve at the height of 20 meters is not as good as that of the remaining two heights. In order to better prove the forecasting performance of the combined model, the model ELM-Elman-LSTM is com-351 pared with three single models (ELM, Elman, and LSTM), respectively. Table 5 records the three evaluation indexes 352 (MSE, MAE, and MAPE), and Table 6 records the value of the R-square of the four models. 353 First, we make a simple comparison of three individual models. From the perspective of MAE and MSE, the MAE 354 and MSE of the ELM model are higher than that of the other two models at the height of 20 meters on Wednesday, and 355 the MAE and MSE of ELM are less than the other two models in the remaining data sets. From the perspective of MAPE, 356 the MAPE using the ELM model is higher than the Elman at the height of 50 meters on Wednesday and 20 meters on 357 Sunday. The MAPE of ELM is higher than that of the other two models at the height of 20 meters on Wednesday. Except 358 for these two cases, the three evaluation indexes of ELM are smaller than those of the other two models. Based on this, 359 it can also be said that ELM has better forecasting performance than Elman and LSTM. In addition, it can also be seen 360 from the table that the evaluation indexes of Elman and LSTM fluctuate up and down, and the two models have differ-361 ent advantages for different data sets. Next, we compare the combined model with single models. 362 Compare the forecasting results of the ELM-Elman-LSTM model with the ELM neural network. The value of MSE 363 using ELM-Elman-LSTM is higher than that of ELM at the height of 50 meters on Wednesday and Monday. For example, 364 for the wind speed at the height of 50 meters on Wednesday, the MSE is 0.2600 of ELM-Elman-LSTM, while the MSE of 365 the ELM is 0.2587. Except for the above case, the values of the three evaluation indexes of the remaining data sets are 366 less than that of ELM. For example, for the wind speed data at the height of 20 meters on Monday, the MAPE of ELM 367 is 14.68%, and the MAPE of ELM-Elman-LSTM is 13.46% that decreased by 1.22%. 368 Compare the forecasting results of the ELM-Elman-LSTM model with the Elman neural network. The value of the 369 three evaluation indexes forecasted by ELM-Elman-LSTM is smaller than that of the Elman model. It can also be said 370 that the forecasting performance of the ELM-Elman-LSTM model is better than Elman neural network. For example, for 371 the weed speed data at the height of 20 meters on Tuesday, the MSE is 0.2540, the MAE is 0.4179, and the MAPE is 372 11.89% forecasted by Elman neural model. While the MSE is 0.1618, the MAE is 0.3140, and the MAPE is 8.63%, fore-373 casted by ELM-Elman-LSTM, the value of MAPE decreased by 3.26%. 374 Compare the forecasting results of the ELM-Elman-LSTM model with the LSTM neural network. The value of the 375 three evaluation indexes forecasted by ELM-Elman-LSTM is smaller than that of the LSTM model for all datasets, ob-376 served from the angle of MAPE. It can be seen that the value of MAPE is greatly reduced forecasted by ELM-Elman-377 LSTM. For example, at the height of 80 meters on Sunday, the MAPE of LSTM is 11.69%, MAPE of ELM-Elman-LSTM 378 is 5.96% which is reduced by 5.73%. 379 In conclusion, when using MSE, MAE, and MAPE to evaluate the model, the forecasting performance of the ELM-380 Elman-LSTM model for the wind speed at different heights is superior to the three single forecasting models. According 381 to Table 6 and Figure 10, for the wind speed at the height of 50 meters on Sunday, the R-square value forecasted by 382 ELM-Elman-LSTM is smaller than ELM. For the remaining data sets, the R-square value of ELM-Elman-LSTM is greater 383 than the other three individual models. Moreover, the R-square of the combined model ELM-Elman-LSTM is close to 1. 384 For example, for the data at the height of 80 meters on Tuesday, R-square of ELM-Elman-LSTM can reach 0.9388, but 385 for the data at the height of 20 meters on Wednesday, the value of R-square of the four methods is not very close to 1, 386 the highest value is 0.5447.    Next, we compare the forecasting result of the ELM-Elman-LSTM model on wind speed data at different heights. 401 Figure 11 shows the value of the R-square coefficient and MAPE of the combined model ELM-Elman-LSTM at different 402 heights. According to Figure 11 and Table 6, we can see that the value of the R-square of the wind speed data at the 403 height of 80 meters is maximum in the three heights and is also closest to 1. For example, the maximum value of R-404 square can reach 0.9388. The value of MAPE at the height of 80 meters is greater than that at 50 meters only on Tuesday. 405 It also can be said that ELM-Elman-LSTM has the best forecasting performance for the wind speed at the height of 80 406 meters. In addition, if the model is only evaluated from the perspective of R-square, only the value of R-square for wind 407 speed at the height of 20 meters on Wednesday is smaller, which is only 0.5447. For the rest of the data set, the forecasting 408 performance of the wind speed at the height of 20 meters is as good as that at the height of 50 meters. However, if we 409 evaluate the model from the perspective of MAPE, the forecasting performance of the data at the height of 50 meters is 410 better than that at 20 meters. 411 In short, it can be seen from the forecasting results that the ELM-Elman-LSTM can improve the forecasting accuracy 412 compared to a single forecasting model and obtain better forecasting results for wind speed data of different heights. 413 Significantly, the forecasting result at the height of 80 meters is perfect. 414

415
As a kind of renewable green energy, wind energy utilization has become an increasingly important concern. Using 416 wind power to replace traditional energy combustion power generation is an essential application method, which can 417 reduce the pressure caused by the gradual reduction of traditional energy sources, reduce environmental pollution and 418 contribute to the sustainable development of society. Therefore, how to effectively and reasonably apply wind energy 419 has become the focus of research. Wind speed is a direct manifestation of wind energy, and accurate wind speed fore-420 casting has a significant impact on the allocation of wind farms and the stable development of the electricity market. 421 This paper proposes a combined forecasting model ELM-Elman-LSTM, and the society cognitive optimization algo-422 rithm (SCO) is used for the optimization of weight coefficients. In addition, in order to obtain better forecasting results, 423 the original wind speed data is de-noised by the wavelet threshold method. The wind speed data at three different 424 heights are chosen to test the model. The experimental results show that the ELM-Elman-LSTM can improve forecasting 425 accuracy compared with a single forecasting model. Furthermore, the ELM-Elman-LSTM has better forecasting perfor-426 mance for wind speed at different heights.