A Hybrid Ensembled Double-Input-Fuzzy-Modules Based Precise Prediction of PV Power Generation

: Background As one of the most widespread renewable energy technologies, photovoltaic power generation system provides great environmental and economic benefits. Uncertain output of the photovoltaic power generation system gets a lot of concern, due to its randomicity and intermittence features. Achieving precise prediction for the PV power generation will greatly improve the quality of electric energy and enhance the stability of power system operation. Inspired by the powerful ability of fuzzy logic to deal with uncertainty and the superiority of machine learning to handle the time series prediction, a hybrid ensembled model consisted of the Double-Input-Fuzzy-Modules (DIFM) and Extreme Learning Machine is proposed in this paper. Firstly, the PV power generation data is taken as the input of each DIFM in order to efficiently handle the uncertainties. Then, the outputs of each DIFM are used as the input of ELM. Moreover, the least square estimation is applied to train the parameters of the hybrid ensembled model to further enhance the predict precision. Finally, the proposed hybrid ensembled model is applied to achieve a real-world PV power generation forecasting. Results The case study and comparisons results indicate that the proposed hybrid ensembled model outperforms other methods, including the adaptive network-based fuzzy inference system, the single-input-rule-modules connected fuzzy model, the support vector regression and the multiple linear regression, in terms of the mean absolute error, the root means square error, the mean absolute error percentage and the mean relative error. Conclusions This study fully took the advantages of fuzzy logic to deal with uncertainty and the superiority of machine learning to handle the time series prediction, then combined the DIFM and Extreme Learning Machine to propose a hybrid ensembled DIFM based PV power generation prediction model. Experimental results demonstrated the superiorities of the proposed model over the other comparisons, in both prediction accuracy and calculation speed.


INTRODUCTION
The PV power generation technology has been regarded as one of the most important technologies for reducing the carbon emission and enhancing sustainable development ability due to their clean, safe, and sustainable characteristics. The statistic report [1] showed that the global installed capacity of new PV power generation system has over 100GW in 2018, accumulated over 500GW. Nevertheless, the high penetration ratio of the PV power generation challenges the reliability and stability of current power system owing to its output with strong nonlinearity and high levels of uncertainty. Hence, it is critical and urgent to accurately predict the PV power generation for better planning and operating the power system.
Recently, various models/methods, including physical models [2,3], data-driven models [4][5][6] and artificial intelligence methods [7][8][9][10], have been presented for forecasting the output of PV power generation. The physical models highly depended on the analysis to the numerous circuit parameters, such as the shunt resistors, diode influence factors and the corresponded coefficients, etc. Therefore, it is hard to widely used among different PV power generation systems, since various characteristic parameters must be identified. On the other hand, such type models mainly considered the system parameters while ignoring the other influenced factors, for example the weather, the end-customer demands. That leads to the poor prediction accuracy of the physical models. With the development of data mining technologies, many researchers proposed the data-driven models using the historical running data of the PV power generation, meteorological data and demanded load data, etc. Unfortunately, the datadriven models hardly depict the nonlinear features of PV output power. Hence, the predicted accuracy is still not high [11]. Although the predicted results of artificial intelligence methods outperform than the above two types models, it significantly sacrifices the computation complexities and prediction speed. Hence, artificial intelligence methods are not suitable to achieve short-term or real-time prediction for the PV output power. On the other hand, such type methods rarely capture the linear features of the PV power generation system. No matter the physical models, the datadriven models or the artificial intelligence methods are single model and hardly meet the predication requirement owing to the complex characteristics of the PV power generation system, e.g., nonlinear, chaotic, intermittence, etc.
Utilizing the respective advantages of different single models/methods, hybrid models have attracted more attention of researchers. According to the way of hybrid, such models can be classified into the following types: 1). Combine single models, mainly are artificial intelligence models, and optimization algorithms. William et al. [12] proposed genetic algorithm-based support vector machine (GASVM) hybrid model for short-term power forecasting of residential scale PV power generation system. In [13], Semero et al. proposed a hybrid model based on a combination of PSO and adaptive neuro-fuzzy inference system (ANFIS) [14] for one-day-ahead hourly PV power generation prediction in microgrid [15], Ni et al. applied the differential evolution algorithm to optimize the combination weight value of Extreme Learning Machine (ELM) and then proposed a hybrid forecasting model based on the ELM and differential evolution algorithms for shortterm PV power generation. Liu et al. [16] applied the chicken swarm optimizer to optimize the weights and the thresholds of ELM, in order to improve the prediction effect and strength the convergence. The hybrid models are superior than the single models for predicting the output of PV power generation. In fact, the optimizers are just used to optimize the parameters of models, aggress the calculation speed, but cannot extracted new features.
2). In order to extract more features from the complex datasets of the PV power outputs, some papers proposed different hybrid models by combining different methods together. Majumder et al. [17] applied the variational mode decomposition (VMD) to extract the features from original nonlinear dataset, and then used the extracted features to train a robust kernel-based ELM (RKELM), in order to strengthen the forecasting accuracy for PV power generation system. Zhang et al. [18] presented a new integrated model based on the improved empirical mode decomposition (IEMD) and autoregressive moving average with exogenous terms (ARMAX), in order to better capture the characteristic of the PV power output. Similarly, Wang et al. [19] proposed a hybrid model combined the convolutional neural network with long shortterm memory network to achieve the prediction of the PV power generation. Moreover, Wang et al. [20], Giorgi et al. [21], Malvoni et al. [22] respectively used wavelet transform (WT) to decompose the original data and extracted different features, and then used different artificial intelligence methods, e.g., LSSVM, improved deep belief network (IDBN), generalized regression neural network (GRNN), ANFIS, etc., to achieve the final forecasting. Although the improvements of the forecasting results are obvious by using the signal decomposition model to extract features firstly, the single artificial intelligence models have the common shortcomings: complex and randomly given parameters, uncertain model structure and hardly to achieve global optimum.
3). In order to overcome the above-mentioned problems, combine the type 1) and the type 2) together, and present the third type hybrid models. The features are extracted by the WT, signal decomposition models or the other methods firstly. Then, the artificial intelligence models are used to achieve the prediction. Finally, some optimization algorithms are applied to optimize the parameters or structure so as to further increase the predict accuracy. Shang et al. [23] applied an enhanced empirical model decomposition (EEMD) to obtain the features, and then selected the improved support vector regression (ISVR) method to achieve the prediction. The optimization algorithm was used to fine tune the related free parameters. Eseye et al. [24] combined WT, particle swarm optimization (PSO) and SVM to improve the short-term predict accuracy of a real microgrid PV power generation system. Zhang et al. [25] proposed an adaptive hybrid model combing with improved VMD (IVMD), autoregressive integrated moving average (ARIMA) and improved DBN (IDBN) to predict the dayahead PV output power. Behera et al. [26] proposed a hybrid model based on the EMD and ELM optimized to achieve the prediction of PV power output. These type models can achieve a better prediction results than the above two type methods. However, the model structure and computation are complex. Moreover, the high volatility, uncertainty and randomicity characteristics of the PV power generation system should be further considered, in order to obtain a more satisfied forecasting result.
On the other hand, fuzzy logic system (FLS) as one of the powerful tools to handle high level of uncertainties have been widely applied in smart grid, intelligent transports and intelligent city, etc. Peng et al. [27,28] applied the FLS in the wireless sensor networks for solving the uncertainties of power allocation and energy consumption. Li et al. [29] combined the FLS and wavelet transform to achieve a short-term building electrical load forecasting. Moreover, the short-term traffic flow prediction was also researched based on data driven FLS [30].
Therefore, this paper will give a hybrid ensembled model based on the Double-Input-Fuzzy-Modules (DIFM), ELM and optimization algorithm, in order to further improve the forecasting accuracy of the PV power generation. The contributions and novelties of this study can be summarized as follows:  A hybrid ensembled model is proposed for the PV power generation forecasting.
In the proposed model, the mapping between original data and features is obtained through the DIFM. This can effectively deal with the uncertainties and nonlinear features of PV power generation data.  Then the outputs of DIFMs are used as the inputs for ELM and the final forecasting result is obtained. This fully take advantage of ELM to enhance the generalization abilities and solve the overfitting problem. Moreover, the training speed can also be improved.  Moreover, the parameters of the model are optimized by least square estimation, in order to further improve it predict performance. Similar to the aforementioned, least square estimation method not only achieves the optimal parameters, but also keep the faster training speed.  The proposed hybrid model is applied to predict the output of a real PV power generation system, and the detailed comparisons are also given. Both the real application and the comparisons indicate that the proposed hybrid model outperform other models in terms of the mean absolute error (MAE), the root mean square error (RMSR), the mean absolute error percentage (MAEP) and the mean relative error (MRE). The rest of this paper are organized as follows: the detailed demonstration of the proposed hybrid ensembled model will be presented in the Section II. Both experiments and comparisons will be given in the Section III. Finally, the conclusions will be given in Section IV.

THE PROPOSED HYBRID ENSEMBLED MODEL
In this section, the proposed hybrid ensembled model will be given in detailed. Firstly, the framework of the proposed hybrid ensembled model will be present. Then the detailed design of DIFM will be presented. Furthermore, ELM and model integration also will be provided. Finally, the training of the hybrid ensembled model will be demonstrated.

Framework of the proposed hybrid model
The structure of the proposed hybrid model is illustrated in Fig. 1, where each DFIM consists of two inputs variables selected from the original dataset through a moving window with step length one. The least square estimator is applied to optimize the consequences. Then the outputs of different DIFM are used as the inputs of ELM to obtain the final forecasting result. Similar, the parameters of the model are also be optimized by the least square estimation method. More specifically, the predication processes in this study are listed as follows:  First, utilize the DIFM to handle the uncertainties hidden in the original dataset and extract the features. Moreover, a moving window is applied to select the input variable in order to fully mining the depth information. In this part, the least square estimation is adopted to generate the fuzzy rules and optimize the parameters for the DIFM respectively.  Second, the features extracted from each of DIFM construct the input variables for the ELM part. Then the final result is obtained from the ELM. Similarly, the parameters of ELM are optimized by the least square estimation in order to further improve the forecasting accuracy.

.1 Double-input-fuzzy-modules
For each DIFM, the triangular membership functions (MFs) are adopted for the two input variables. The MFs for input variables is given in Fig. 2 c is the crisp consequent, m is the number of fuzzy sets for each of input variable. Therefore, the number of fuzzy rules of each DIFM is m 2 .
Finally, the output of DIFM-i can be obtained as follows.

Extreme learning machine
As one of the most popular single hidden layer feedforward neural networks, ELM has demonstrated superior performance items of learning speed and approximation performance. For the ELM, the standard input-output mapping can be expressed as follows.
where L is the number of the hidden neurons and is set before training, where the l y and l y are the l-th predicted outcome and the l-th real sample. Subsequently, the equation (2) can be rewritten as y H (5) where H is the outputs of hidden layers (denoted as training matrix), is the output weights between the hidden layers and the output layers. According to the training data set, they can be obtained as T N y y y y (8) Hence, the output weights can be computed by † Hy (9) where † H is the Moore-Penrose pseudo inverse of the training matrix H.

Models training
For our proposed the ensembled model, the training data set is denoted as 1 1 ,, Then the models training process can be descripted as follows.
Step1: based on the equation (3) Step3: the weight matrix can be computed according to the equation (9) and be optimized by the least square method.

EXPERIMENTS and DISCASSIONS
In this section, the proposed ensembled model will be applied to a real-world PV power generation prediction application. Moreover, it also will be compared with several popular predicted methods, in order to verify its advantage. Hence, several popular predicted methods will be firstly introduced. Then four popular metrics will be given for performance comparisons better. Finally, both experimental results and comparisons will be given in detail.

Several methods for comparison
In the following, the proposed ensembled model will be compared with some popular methods, e.g., ANFIS, the single-input-rule-modules connected fuzzy model (SIRM-FM) [31], SVR and the multiple linear regression (MLR) [32].
ANFIS combines neural network with fuzzy reasoning, then uses mathematical programming (least squares estimation) and a gradient-based algorithm to optimize the parameters. Therefore, it is one of the most popular and powerful prediction approaches and have wildly used in many applications [14][15].
SIRM-FM is one kinds of modular fuzzy models and constructs a fuzzy rule module for each input variable, then aggregate all the single-input-rule-modules for different input variables to obtain a crisp output. It has been wildly applied in various fields, e.g., energy consumption prediction of building [29], traffic flow prediction [30], etc.
SVR is an important variant of SVM. The kernel functions are used for solving the prediction problems, in order to achieve strong generalization ability and good prediction performance.
In MLR, the prediction is achieved by analyzing the mathematical correlation between the model variables and the observed data of the samples. It also has been wildly applied in solar energy prediction [32], river discharge forecasting [33], and so on.

Metrics
In order to evaluate the performance of all methods, the following four metrics, including MAE, RMSE, MSPE and MRE, are used in our study.  (15) where N is the number of training data or test samples, the l y and l y are the l-th predicted outcome and the corresponding real sample. For the above-mentioned indices, the greater values of those mean larger gaps between predicted values and real sample and worse prediction performance.

Applied data set
In the experiments, the PV power generation data collected from a real PV power generation system located in an area of Germany. This data set can be obtained from http : //www.elia.be/nl. The sampling cycle is 15 minutes and duration from Jan 1, 2016 to Jun 30, 2016 with a total of 17,472 sample points. The former five months sample data is used as the training data set, and the last months sample data is used as the testing data set.

Experimental results and discussion
The prediction results of five different models are shown in Fig. 3, for instance Fig.  3(a) hybrid ensembled model, Fig. 3 Table 1. The Table 1 demonstrates that no matter the MAE, RMSE, MSPE or MRE of the hybrid ensembled model are better than the others. For better visualization, the corresponding testing results are also drawn in Fig. 4. Moreover, the box-plots of the absolute predict errors are also given in Fig. 5.   Fig. 5, it can be clearly seen that the proposed hybrid ensembled model has the smallest forecasting absolute error median and the smaller heights between the bottom and top edges of the box-plots. This verifies the advantages of the proposed hybrid ensembled model again. 5.

CONCLUSION and FUTURE WORK
In order to handle the uncertainties of PV power generation system and improve the prediction performance, this study fully took the advantages of fuzzy logic to deal with uncertainty and the superiority of machine learning to handle the time series prediction, then combined the DIFM and ELM to propose a hybrid ensembled DIFM based PV power generation prediction model. In the proposed hybrid ensembled model, the original data set was handled by the DIFMs aiming at dealing with the uncertainties. Then each output of DIFMs were taken as input of ELM to obtain the final prediction. Furthermore, the least square estimation was adopted to optimize the parameters of the proposed model, in order to further improve the prediction performance. Both experiments and comparisons were given. The comparative results demonstrated that the proposed hybrid ensembled model outperformed other models in terms of MAE, RMSE, MSPE, MRE.
In this study, the data set collected from an area of Germany didn't consider some important factors, e.g., the load changing, the solar changing, the location difference, human activities, etc. Therefore, the prediction performance will be further enhanced by taking the aforementioned factors into the model. On the other hand, the proposed hybrid ensembled model can also be used handle the similar time series prediction problems, e.g., electrical load prediction, traffic flow prediction, indoor temperature prediction, etc. The above mentioned two points will be researched in our future work.