Optimized Machine Learning-Based Forecasting Model for Solar Power Generation by Using Crow Search Algorithm and Seagull Optimization Algorithm

Forecasting Solar Power is an important aspect for power trading companies. It helps in energy bidding, planning, and control. The challenge in forecasting is to predict nonlinear data, which can be fulfilled by the computation technique and machine learning model. ML models have high accuracy for time-series forecasting, but their accuracy is poor for nonlinear forecasting. To enhance the ML model for nonlinear prediction, an optimization algorithm is used for training. This paper presents how the computation technique is incorporated into the machine learning model and compared it with the conventional model. CSA-ANN and SOA-ANN models are developed and forecast solar power for a-day-ahead, three-day-ahead, and a week-ahead solar power generation by considering time, irradiation, and temperature as input parameters for the model. The models are compared with ANN, DE-ANN, and PSO-ANN since these models are widely used. Upon comparison, ANN gives the best result for short-term prediction but is unable to predict midterm and long-term predictions, whereas this problem is overcome by SOA-ANN, which is done by changing its training algorithm, and its performance is measured via statistical parameters such as MAE, MSE, MAPE, and R2. The percentage improvement of SOA-ANN is obtained with these statistical parameters as 6.54%, 16.05%, 1.67%, and 3.61%. Hence, SOA-ANN gives best result as compared to other models.


SVM
Space vector machine

Introduction
In the last two decades, electricity demand is kept on growing, due to development in science and technology.Technology has changed every single aspect of lives such as ease of access to information, saving time, ease of mobility, better communication, cost and efficiency of the system, innovation in many fields, and artificial intelligence, i.e., in the present scenario technology is solely dependent on electricity, hence technology and electricity are proportional to each other.An increase in electricity demand is a burden to the power sector companies since generation has to meet demands.Hence, power sector companies are forced to look for solutions and one of the possible solutions is to rely on renewable energy sources such as solar and wind.Both these sources are naturally dependent and intermittent.To rely on renewable energy sources, the power sector company has to do forecasting based on past data.Forecasting has many benefits such as scheduling, planning, coordination among generating stations and devices, maximum utilization of power generating plants, minimizing risk in bidding, reliable and economic operation for the grid and interconnected system, etc. Solar energy is highly nonlinear since it is nature dependent and intermittent.A fast and accurate model is required to forecast the forecasted parameter, which can handle nonlinear data.
In the literature, many researchers have created forecasting models for linear data, time-series data, and nonlinear data, but they are not accurate, and hence, it will always be a part of research for the researcher.Solar power generation (SPG) depends on meteorological factors such as temperature, irradiation, pressure, wind, module temperature, direction, and humidity [1] Some of these factors are taken for prediction purposes.In real time, unpredicted solar power affects the reliability, scheduling, and stability of the system [2,3], thus forecasting helps to minimize these effects.Based on the availability of data, forecasting is classified into three categories such as short-term forecasting useful for unit commitment, generation control, and energy trading, and medium-term and long-term forecasting useful for future planning, and handling the operation of grid and transmission systems.[4].Various statistical forecasting models have been developed by the researcher such as autoregressive moving average (ARMA) [5], autoregressive integrated moving average (ARIMA) [6], autoregressive integrated moving average with exogenous inputs (ARMAX) [7], and statistical time series model [8] which supports linear data, but it does not give promising results for nonlinear data.Various methods have been presented in the literature to handle nonlinear data such as optimization algorithms, machine learning, fuzzy-based system, and hybridization of the algorithms.Machine learning can learn and adapt uncertainties in the system which is useful for prediction applications.This ability of ML has shown great performance in image recognition, signal recognition, stock market trading, etc. [9][10][11].Various ML models have been developed for forecasting irradiation and power such as ANN [12,13], SVM-based estimation and evaluation of solar irradiation [14,15], and power generation based on the satellite image and SVM [16,17], some deep learning algorithm [18] which is also presented for power generation forecasting with integrated wavelet [19], ensemble earning approach and autoencoder-based ELM is used for feature learning [20,21].Some researchers also presented hybrid models to improve the forecasting model such as fuzzy integrated with ANN [22], in contrast to the ARMA model time delay neural network shows enhanced performance, and hybridization of these two concludes stable and accurate results [5].ML itself is a good approach for solving problems, but as the usage of ML increases in various applications, some new problem arises in ML such as the possibility of high error, algorithm selection, data acquisition, underfitting, and overfitting.To solve those problems advancement in ML takes place such as the hybrid model, backpropagation (BP) [23] Algorithm in which error is propagated back to minimize the error, recurrent neural network (RNN) [24] is widely used in various application because of its internal memory, Echo State Network (ESN) [25] in which dynamic neurons are used.Apart from this, the performance of ML can be enhanced by tuning ML parameters such as weights and bias or algorithm parameters.One of the approaches is by using an optimization algorithm.
Optimization algorithms have been used in many engineering problems because it finds the best possible solution for a particular problem.Various optimization algorithms have been developed by the researcher, and a review of different optimization algorithms is illustrated in [26], which consists of seventeen optimization algorithms based on biology-based algorithms, five physics-based optimization algorithms, and two geography-based algorithms.A comparison of six different optimization algorithms on the performance of the cam-follower mechanism is illustrated in [27].In this work, crow search optimization algorithm (CSA) and seagull optimization algorithm (SOA) [28,29] are used to train the model to get optimized weight values with minimum error and to reduce training time.Highlights of this paper are as follows: • Two models are developed for prediction, ANN and optimized-ANN (OANN).• Pseudocode is presented for ANN, CSA-ANN, and SOA-ANN.• Two different approaches are presented for incorporating optimization algorithms in machine learning.
• A-day-ahead, three-day-ahead, and week-ahead prediction are presented in the paper.

Crow Search Algorithm
CSA was introduced by Alireza Askar Zadeh in 2016.It is a nature-inspired population-based algorithm and works on the intelligence of crows in search of food.Generally, crows steal food from other birds' hidden places.Apart from stealing, it takes precautions so that other crows or birds can't find their hidden place.
This intelligence is developed as an optimization algorithm for many engineering applications.Implementation of the crow search optimization algorithm involves four stages as per their intelligence; they live in groups, memorize hidden places, search other hidden places for food, and protect food from others.These four stages in the optimization process are the initialization of crows, their memory locations, updating the position of crows in search of food as per their memory, and preventing it from thievery as expressed in Eqs.(1), (2), (3), and (4).
Crows : Memory : m where 'N' and 'd' are group size and dimension (or decision variables) over a search space such that the size of vector X and m are [N x d].Each row in vector X represents a solution to the problem and vector m represents the memory of each crow in which food is hidden; at the initial state, they have no experience; hence, the position of the hidden place for food is initialized.The fitness function is obtained by substituting design variables in the objective function.Now the ith crows try to steal the food from observing the jth crow from its memory and jth crow tries to fool ith crow by changing the location of the food on the territory by knowing the intention of the crow and the ith crow changes its position and memory as: Update Position : X i, iter+1 X i, iter + r i + f l i, iter * m j, iter − X i, iter r j ≥ AP j, iter a random position otherwise Update Memory : where 'r i ' and 'r j ' are two random numbers ranging between [0,1], which decides the crow to protect their food from other crows by comparing with the awareness probability 'AP'whose value is fixed at 0.1, which enhances the intensification or diversification if the values of 'AP' is low and high.'fl'is the flight length whose value is 2.

Seagull Optimization Algorithm
Seagulls are omnivorous seabirds that eat insects, reptiles, earthworms, fishes, etc. their scientific name is Larade and found all over the planet.They are intelligent and uses their feet to attract hidden earthworm under the ground by producing a sound like rain, and they also attract fish with bread crumbs.This intelligence helps them to find food while migrating.The attacking and migrating behavior of seagulls are modeled for optimization problems for solving various engineering problems.For mathematical modeling, the optimization work in two phases exploration and exploitation phase is inspired by the behavior of seagulls during migration and attack.Exploration (Migration of seagulls) Exploration is carried out in three phases to move from one position to another position in the search space.
(1) Avoiding collision of seagulls: A variable 'A' is introduced to avoid collision between seagulls in search of the new position of seagulls (i.e., search agents.) where 'A' represents the movement behavior of the search agent in the search space, ' − → P s ' represents the current position of the seagull, 'x' indicates the current iteration and ' − → C s ' represents the position of the search agent which does not collide with another search agent.
where 'f c ' is set to 2, which linearly decreases from 'f c ' to 0 and is employed to control the frequency of variable 'A'.
(2) Movement toward best neighbor direction: After successfully avoiding the collision between neighboring search agents, the search agents are moving toward the best neighbor.
where ' − → P s ' represents the position of the search agent and ' − → P bs ' represents the position of the best search agent and ' − → M s ' represents the position of the search agent moving toward the best search agent.To balance the exploration and exploitation phase, a variable 'B' is introduced and randomized with a variable 'rd' which lies in the range of [0,1].
Exploitation (Attacking behavior of seagulls) Exploitation refers to utilizing their knowledge, experience, and history during their search process.Seagulls attack their prey in a spiral manner which can be modeled in x, y, and z planes; they can change their speed and angle of attack continuously.
r u * e kv (13) where 'k' is a random number that lies between [0,2π], 'u' and 'v' are constant to define their spiral shape whose value is 1, 'e' is the base of the natural logarithm and 'r' is the radius of each turn of the spiral.Finally, the search agents update their positions concerning the best search agent and can be modeled as Eq. ( 14).
where − → P s (x) saves the best solution and updates the position of the search space.

ANN and OANN
A computational model for neural networks was first introduced in 1943 [30] and perceptron-based first artificial neural networks is introduced in 1957.ANN consists of three layers input layer, an output layer, and a hidden layer.The structure of ANN used in this work is portrayed in Fig. 1 in which the input layer consists of three nodes each node represents solar parameters such as time, solar irradiation, and temperature.The hidden layer consists of five nodes with a sigmoidal activation function, and the output layer consists of a single node with a sigmoidal activation function.Weights between the input layer to the hidden layer are represented by W 21 and between the hidden layer to the output, layer are represented by W 32 , bias in the hidden layer and output layer are not considered in this work.The work presented in this paper consists of two parts.In the first part, ANN architecture is trained and tested with the simple architecture as portrayed in Fig. 1; in the second part, weights W 21 and W 32 are updated by an optimized algorithm as discussed in section II; the reason for approaching the second part of the work is to train the network in such a way that its performance is enhanced.The important part of the work is how the network is trained.In this section, training process with a conventional approach and with an optimized approach is illustrated.
The conventional approach of Training ANN For training, any network first and foremost important is the collection of data from a valid source.In this work, the data are collected from Chhattisgarh State Power Transmission Company Limited (CSPTCL) from February 2019 to December 2019.Hourly data are collected and out of which only useful data are taken, i.e., the point where the output is obtained (i.e., nonzeros data are considered, during night time output data are zero hence it is excluded in the training part) These data are normalized between 0 and 1 for proper handling of data.These data consist of input and output in which time, solar irradiance, and temperature are taken as input parameters (i.e., X i [3 × 1]) and solar power is taken as target data (i.e., X t [1 × 1]); now randomly initialize the value of the weights and the network is ready for training.Error and hidden layer values are calculated mathematically as: Input to the hidden layer is : The output of the hidden layer is : Input to output layer is : The output of the output node is Y Error where Eqs. ( 15)-( 18) are the mathematical calculation to obtain the output from ANN architecture and Eqs. ( 20) and ( 21) are used to update the weights.Error is calculated with Eq. ( 19) and the derivative of the output layer and hidden layer is obtained by Eqs. ( 22) and ( 23).The net weight is obtained in two ways, i.e., by taking the average of the weights obtained by all the input samples or by selecting one of the weights obtained by all input samples, whose mean square error is minimum.In this work, second approach is used to calculate net weight, not the average weight.The total size of samples for input is 3000 × 3; i.e., 3000 samples are considered, and the size of the output is 3000 × 1.

OANN
The objective of using ML or deep learning algorithms is to develop a random function approximation or to find a relation between the variables.It uses an algorithm to train the network attributes by minimizing the loss function.Various algorithms have been reported in the literature to minimize the loss such as gradient descent [31], backpropagation [32], stochastic gradient descent method [33], conjugate gradient descent [34], steepest descent algorithm [35], and Rprop algorithm [36].The advantage of these algorithms is easy implementation, ease to understand, continuous learning, wide application, and efficiency, but these methods have certain demerits such as high computation time except the Rprop algorithm, veering off due to frequent updates, the convergence rate very slow, sensitive to noise and irregularities, may trap in local minima.To overcome these challenges, optimization-based ML is used.The process of incorporating the optimization algorithm in ML is discussed via pseudocode and a generalized structure is given in the form of flowchart as shown in Fig. 2. is to implement any optimization algorithm in ML.

ANN training with optimization algorithm
The most important factor in any optimization problem is the objective function.In this work, mean square error (MSE) is taken as an objective function as expressed in Eq. ( 24).The number of populations is taken as 100 and 50 iterations.This is the benefit of the optimization algorithm where the minimum value of error is achieved with a smaller number of iterations.In the conventional approach, number of iterations is taken as 10,000 to reach minimum error.Here the net weight is considered by taking one of the weights obtained from all the samples whose error is less by considering all input samples the same as the weight calculation for ANN.

Predict the output
The optimization algorithm has many advantages such as it can solve complex problems, is robust in nature, has fast convergence, and provides a global solution.The advantages of the optimization algorithm incorporated with ML give good results as reported in the literature in many applications such as [38] illustrates DE-based ANN for time series forecasting, [38] illustrates nonlinear-based forecasting done for a very short time interval, PSO-based ANN is illustrated in [39], [40], [41], GA-based ANN is illustrated in [42], and the comparison between GA-ANN and PSO-ANN is illustrated in [43], and the results show that GA-ANN has higher accuracy over PSO-ANN and BP algorithm.This paper demonstrates the comparison between ANN, CSA-ANN, and SOA-ANN along with existing methods such as DE-ANN and PSO-ANN.

Statistical Measures
To predict the closeness to actual data statistical measures are used.Mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation of regression (R 2 ) [4] are used to evaluate the performance of the model; these four performance indices are mathematically represented in Eqs. ( 25)- (28).If the sample size is large and the prediction depends on one single value (in this work weights are the important parameter for both ANN and OANN), then the reduction in MSE is more advantageous for performance measures.If it is done in the simulation part, then integral time absolute error (ITAE) and integral time square error (ITSE) are used for performance measures.In general, the value of MSE is bigger than MAE.Co-relation of determination (R 2 ) gives the regression line ranges between 0 and 1.If its value is near one, it exhibits strong linear relation.MAPE is associated with the accuracy of the predicted data.A lesser value represents more accuracy.

MAE
1 123 Fig. 3 Comparison of forecasted model output with actual data for aday-ahead prediction

Results and Discussion
Solar power generation forecasting for Chhattisgarh State Power Transmission Company Limited (CGPTCL) is done for a short term (i.e., a-day-ahead) and midterm (i.e., threeday-ahead and a week-ahead prediction) (Note: Short-term is used for the prediction for an hour ahead, a-day-ahead and two-day-ahead prediction, midterm prediction involves for a 3-15 days ahead, i.e., a week or two-week-ahead prediction).The data used in the work are collected from CSPTCL from Feb 2019 to Dec 2019.Hourly data is considered for the forecasting model in which time, temperature, and irradiation is taken as input.The data are normalized between 0 and 1, for proper handling of data, and at last, the data are de-normalized to their original value for comparison with the actual data.The model used in the work is ANN and optimized ANN model.Two optimization-based forecasting model is developed one is CSA-ANN and the other is SOA-ANN.These models are compared with conventional ANN along with existing methods for validation of work such as DE-ANN and PSO-ANN, for the short-term and midterm   From Figs. 3 and 4, it is observed that the ANN, DE-ANN, and SOA-ANN are compatible with each other, but the other method's PSO-ANN performance is poor and gets saturation point, whereas the CSA-ANN can follow the path of the actual data (i.e., the slope of actual data matches with the slope of CSA-ANN forecasted data.).This can be observable with a statistical comparison between the models as tabulated in Tables 1 and 2, and the good results are shown in bold letters.For predicting a day, ANN performance is the best and DE-ANN is close to the ANN result.But as the prediction of the day increases, the ANN and DE-ANN show poor performance as compared to SOA-ANN.
From Fig. 5, it is observed that during bad weather, i.e., on the 6th day (120-144 h), ANN gives poor performance, whereas the OANN model can predict bad weather days also.It is also observed that on that particular day the SOA-ANN is very close to actual data.It is therefore concluded that SOA-ANN can predict low as well as high data.Still, we can't conclude the performance of the model, the conclusion can be reached by analyzing its statistical parameters.The performance of the forecasting model for prediction is evaluated based on statistical parameters.The statistical parameters used in the work for comparison are MAE, MSE, MAPE, and R 2 .These four parameters are significantly used in forecasting for evaluation.The minimum value of MAE, MSE, MAPE, and maximum value for R 2 represents the best performance for the forecasting model.So, a comparison is made for the forecasting model with this statistical parameter for short-term and midterm ahead prediction which is shown in tabulation format.Table 1, represents the comparison for an a-day-ahead prediction.Table 2 represents the comparison for three-day-ahead predictions, and Table 3 represents the comparison for a week-ahead prediction.
From Table 1, it is observed that ANN exceeds the performance of both OANN models.It can be concluded that for short-term prediction ANN gives better results as compared with other models.
From Table 2, it is observed that ANN exceeds the performance of both OANN models, but its performance over MAE is poor in comparison with SOA-ANN.The percentage improvement of SOA-ANN is 6.5%, which is calculated by using Eq. ( 28).
From Table 3, it is observed that SOA-ANN outperforms ANN and CSA-ANN.The percentage improvement of SOA-ANN based on the statistical parameters is tabulated in Table 4, which is obtained by substituting values in Eq. ( 28).
ANN accuracy is high for short-term prediction which is its advantage but its performance deteriorates for midterm forecasting.To increase its performance the training algorithm is modified via a global optimization algorithm.Its performance is tabulated in Tables 2 and 3, which guarantees the results.

Improvement%
|ANN metric −SOA metric | ANN metric * 100 (28) On observing both figures and tables, it can be said that not all optimization algorithm-based forecasting model improves the performance of conventional machine learning-based prediction model.
The weights obtained after training of the forecasted model are shown in tabular format in Table 5.These trained weights can be used for this particular problem only because the weights are obtained for a particular dataset (i.e., time, irradiation, and temperature in sequence).
The average training time for the model's ANN, DE-ANN, PSO-ANN, CSA-ANN, and SOA-ANN is obtained as 445.4128, 177.5161, 128.2633, 75.6305, and 53.1360 s.It is observed that the training algorithm for SOA-ANN is the least as compared to other models.Hence, it can be concluded that the OANN model is fast as compared to the conventional model.

Conclusion
SPGF is important for power sector companies in terms of energy bidding, scheduling, etc. to support power sector Companies, an optimization-based machine learning model is developed, i.e., CSA-ANN and SOA-ANN.ML model has high accuracy for time series data but for nonlinear data its accuracy is low, to enhance the ML model Optimization technique is used.This  5.The developed models were performed to predict SPG for a day ahead, three day ahead, and a week ahead.By comparing the results, it is concluded that the ANN model predicts better than OANN models for short-term duration, i.e., a-day-ahead prediction.For midterm prediction, SOA-ANN gives the best result as compared to other ML models.ANN performance is enhanced or its limitations/demerits are overcome by changing its training algorithm.Incorporating the optimization algorithm in the training section enables it to forecast midterm also.The percentage improvement of SOA-ANN in comparison with the ANN model is obtained as 6.54%, 16.05%, 1.67%, and 3.61% in terms of MAE, MSE, MAPE, and R 2 .Hence, it can conclude that the ANN performance can be enhanced by an optimization algorithm and the overall performance of SOA-ANN is good as compared to the other methods.

( 3 )
Remain close to the best search agent: The closeness of the search agent toward the best search agent is represented by the ' − → D s ', due to which the search agent updates the position.

Fig. 4 Fig. 5
Fig. 4 Comparison of forecasted model output with actual data for three-day-ahead prediction duration.The results associated with short-term prediction are shown in Fig.3, and midterm prediction is shown in Figs.4 and 5.Each Figs.3-5 represents the comparison of forecasted data with the actual data which is shown in the legends in which 'Actual' represents the actual data to which the comparison is made.'ANN' represents forecasted data based on the ANN model, 'DE-ANN' represents the forecasted data on the differential evolution-based ANN model, 'PSO-ANN' represents the forecasted data on the particle swarm optimization-based ANN model, 'CSA-ANN' represents the forecasted data based on crow search algorithm-based ANN model and last 'SOA-ANN' represents the forecasted data based on SOA-based ANN model.
work helps the researcher to incorporate computation techniques into the machine learning model.CSA-ANN and SOA-ANN models are developed and compared with the conventional methods, i.e., ANN, and the optimization-based model is compared with DEand PSO-based ANN model since these two global optimization algorithms are widely used.The trained weight values are shown in Table E Y − X t (19) W 32, new W 32, old + alpha *E *delta *O H1 (20) W 21, new W 21, old + alpha * E *delta *W 32 *del H1 *X i (21)

Table 4 Percentage
Bold values represent the best values, based on which model performance is measured/evaluated 123