Forecasting of solar radiation for a cleaner environment using robust machine learning techniques

Intensified research is going on worldwide to increase renewable energy sources like solar and wind to reduce emissions and achieve worldwide targets and also to address the depleting fossil fuels resources and meet the increasing energy demand of the population. Solar radiation (SR) is intermittent, so forecasting solar radiation is a must. The objective of this research is to use modern machine techniques for different climatic conditions to forecast SR with higher accuracy. The required dataset is collected from National Solar Radiation Database having features such as temperature, pressure, relative humidity, dew point, solar zenith angle, wind speed, and direction, concerning the y-parameter Global Horizontal Irradiance (GHI) (W/m2). The collected data is first split based on different types of climatic conditions. Each climatic model was trained on various machine learning (ML) algorithms like multiple linear regression (MLR), support vector regression (SVR), decision tree regression (DTR), random forest regression (RFR), gradient boosting regression (GBR), lasso and ridge regression, and deep learning algorithm especially long-short-term memory (LSTM) using Google Colab Platform. From the analysis, LSTM has the least error approximation of 0.0040 loss at the 100th epoch and of all ML models, gradient boosting and RFR top high, when it comes to the Hot weather season—gradient boosting leads 2% than RFR, and similarly for cold weather, autumn and monsoon climate—RFR has 1% higher accuracy than gradient boosting. This high-accuracy model is deployed in a user interface (UI) that will be more useful for real-time solar prediction, load operators for maintenance scheduling, stock commitment, and load dispatch centres for engineers to decide on setting up solar panels, for household clients and future researchers.


Introduction
Renewable energy sources and their effective usage are strongly tied to the size, optimization, and operation of solar energy systems. It can be employed effectively as a limitless, healthy, and environmentally advantageous source of energy for the creation of power. The radiant energy that comes from the sun is known as solar energy. A tremendous amount of electromagnetic energy is produced when hydrogen gas is fused thermonuclearly. Around 1.8 × 10 11 MW of solar energy is absorbed globally, with a mean solar radiation intensity of 1367 W/m 2 on Earth's surface. Figure 1 displays the global distribution of solar energy intensity (world energy resources). The potential to harness solar energy in nations that are geographically above or below 45 N or 45S latitudes is enormous. South Asia, the Middle East, the Mojave Desert in the USA, the Atacama Desert in Chile, the Sahara Desert, the Kalahari Desert in Africa, and the North-Western region of Australia are all viable locations for large-scale PV projects. Solar engineers need to have a good understanding of solar resource availability while creating and building solar photovoltaic (PV)-based energy systems. Unfortunately, solar radiation data are not generally available due to expensive equipment, limited geographic coverage, and a brief record time. Due to the dearth of observed data at the Earth's surface (Mosa et al. 2017;Mehrabankhomartash et al. 2017;Sekulima et al. 2016;Yang et al., 2014;Antonanzas et al., 2016aAntonanzas et al., , 2016bFallah et al, 2018;Zekai 2004;Can et al. 2015;Mohammed et al. 2021;Jumin 2021), global solar energy forecasting has become increasingly important in recent years, as evidenced by the exhaustive literature reviews written by Mellit and Kalogirou (2008). These methods can be divided into three groups based on the forecasting horizon: long-term, middleterm, and short-term PV power forecasting (Voyant et al. 2017, Li et al. 2018. The long-term forecasting time horizon is 1 month to 1 year. One week or less is considered shortterm forecasting, whereas 1 week to 1 month is considered medium-term forecasting. Long-term PV power forecasting helps with long-term planning and decision-making for PV power generation, transmission, and distribution and ensures the steady operation of the power system. Middle-term PV power forecasting aids in the dispatching decision-making process for the power system. Short-term PV power forecasting can support the operation of the power system and hence improve its dependability (Pierro et al. 2017). To forecast PV power, physical techniques, perseverance techniques, and statistical techniques can all be applied (Barbieri et al. 2017, Wang et al. 2017. Physical techniques use mathematical equations to represent the physical condition and dynamic motion of meteorological phenomena. In stable weather, physical-based forecasting models perform well. The bulk of persistence tactics assumes that present and future values are closely related. To estimate the time series' future values, it is assumed that nothing will change between time t and time t + t. (Fu 2018). The forecasting precision of persistence-based models is significantly influenced by historical average values. Statistical models may frequently provide more accurate short-term PV power forecasting estimates since historical PV generation values are taken into account and the model parameters are continuously optimized. For the Turkey region, Bayrakci et al. (2018) provided empirical methods for predicting global solar energy. In this work, 105 literature models were assessed with the aid of statistical validation tests.
The time span and the calibre of the input data have a significant impact on the performance of statistical procedures. Regression method (RM) (Sheng et al. 2018, Persson et al. 2017, support vector machine (SVM) , autoregressive integrated moving average (ARIMA) models (Pedro and Coimbra 2012), autoregressive and moving average (ARMA) models (Lan et al. 2011, Chu et al. 2015, and extreme learning machine (ELM) (Hossain et al. 2017;Deo et al. 2017) are popular statistical techniques used in PV power forecasting. According to Suganthi et al., fuzzy logicbased models were applied to solar, wind, bioenergy, microgrid, and hybrid renewable energy systems. In this study, it was discovered that the usage of fuzzy-based models for site evaluation, PV system installation, power point tracking, and system optimization has increased significantly in recent years (Deo et al. 2017). Recently, Perveen et al. suggested utilizing meteorological characteristics including dew point, sunshine duration, ambient temperature, wind speed, and relative humidity to develop a sky-based model that uses fuzzy logic modelling to estimate global solar energy. In this study, it was shown that adding dew point as a meteorological component considerably increased the model's accuracy (Colak and Kaya 2017) [32].
To overcome the drawback of single machine learning models, ensemble approaches like random forest (Anuradha and Moharail, 2022) and gradient boosting algorithms are employed to increase the system's efficiency. However, solar radiation is a dependent variable; a yearly cumulative forecasting system will get accurate results. Accuracy for such data sets would be challenging to maintain in complex systems with massive datasets using fuzzy logic modelling. To introduce ANN-based models that use data-driven artificial intelligence techniques and can subsequently perform structural simulation, the ANN is the most effective tool for modelling complex, dynamic, and non-linear systems (Yadav et al. 2017, Eseye et al. 2016, Chaudhary and Rizwan 2018, Voyant et al. 2017, Kumar and Kalavathi 2018. The rapid advancement of artificial intelligence (AI) techniques has led to the creation and application of deep learning-based models in several domains (Youssef et al. 2017). It belongs to a brand-new subset of machine learning techniques. Convolutional neural networks (CNN) (Zang et al. 2018) and recurrent neural networks (RNN) (Yona et al. 2013) are two deep learning-based models that have previously shown promise in the prediction of PV power. Deep learning algorithms may harvest deep information from the PV power series and produce forecasting results than traditional physical, persistence, and statistical techniques. Therefore, in our research, all the optimal models are developed and forecast the solar radiation for different seasons. Among the various optimal methods, the best feasible and accurate methods are identified.
The dataset for this study was compiled from the National Solar Radiation Database (NREL) for the years 2016 through 2021. The data is first divided into the different seasons-summer, winter, autumn, and monsoon-while taking into account different forecasting horizons-such as 1 h, 1 day, 1 week, 3 to 6 months, and 1 year ahead. The final data is preprocessed, scaled, and built ready for training. Multiple linear regression, support vector machine, random forest, decision tree, gradient boosting, and lasso and ridge regression are examples of machine learning techniques. Deep learning techniques include recurrent neural networks, particularly long-shortterm memory (LSTM). The Google Colab platform is used here to analyse the overall performance comparison of various above-mentioned machine learning algorithms. Higher accuracy results are to be expected due to adds-on work done here which will be further deployed and made use by load dispatch centres, grid operators, the solar industry, manufacturers, power plant set-up, rural small-scale clients, and researchers.

System architecture
A formal description of the conceptual idea, the schematic flow of the ML work and the overall factors that are considered in the system are described in detail below.

Overview
The structural overview of the whole system is shown in Fig. 2.
An inaccurate predicted result may cause a lot of cost wastage and sometimes the demands that need to be satisfied may also be not attained certainly. So, the accuracy of forecasted solar radiation is also an important parameter that needs to be noted. Training the model with generalized data, i.e. global solar radiation data will not work out. In this research work, as mentioned before, the model has been trained using local solar radiation data.

Methodology
The overall techniques and specific procedures used to train the model and compare the effective model for forecasting solar radiations are focused in this research work. This section allows for assessing the overall validity and reliability of a study. The nature of the learning "signal" or "response" Fig. 2 System architecture available to a learning system is used to classify machine learning implementations which are as follows.

Unsupervised learning
When an algorithm picks up new information from straightforward occurrences with no response, it is up to the algorithm to deduce the patterns in the data on its own. The data is restructured using this kind of technology into new features that could identify a class or a brand-new set of uncorrelated values. They are quite handy for giving individuals new, beneficial inputs for supervised machine learning algorithms as well as perceptions of the significance of data.

Regression
When the outputs are uninterrupted rather than discrete, this is also a supervised problem that relates to this research work. In this paper, regression is mostly employed.

Data preprocessing
Preprocessing is a data mining method for converting unusable data into an effective format.

Data cleaning
There could be many unnecessary or missing data segments. To grasp out all these unwanted components, data cleaning is performed. Dealing with missing data, noisy data, and other issues is necessary.

Missing data
There are many approaches to handling missing data. Here are a few illustrations: Ignore the tuples: This method should only be used when a large number of values in a tuple are missing.
Fill in the blanks: The option of manually filling in the missing values, using the attribute mean, or using the most likely value.

Noisy data
Data that is unimportant and unable to be understood by computers is referred to as noisy data. Regression can be used to smooth data by fitting it to a regression function.

Identifying y-parameter scaling
In this paper, GHI (Global Horizontal Irradiance) is taken as our y-parameter as it is directly related to solar radiation and gives us the result in W/m 2 . Solar radiation has an intensity of roughly 1380 watts per square meter (W/m 2 ) above the earth's atmosphere. From GHI, it is observed about the amount of energy can be captured from the sun with solar panels.

Data transforming
This method is used to transform the data into a format that can be applied throughout the mining phase. The following techniques are used to achieve this: Feature scaling: One of the most significant data preprocessing steps in machine learning is feature scaling. If the data is not scaled, algorithms that compute the distance between the features are skewed towards numerically bigger values. Some feature scaling strategies, such as normalization and standardization, are the most common while also being the most perplexing.
Normalization (or min-max) is a term used to describe the process of scaling a technique for transforming features into a similar scale. The new point is computed as follows: This reduces the range to [0, 1] or [− 1, 1] in some cases. Standardization or Z-score: The change of features by subtracting from the mean and dividing by standard deviation is known as normalization. This is known as the Z-score.

Splitting training and testing set
In the present case, model training and evaluation are not done using the same dataset. Since the objective is to build a reliable machine learning model, the dataset was divided into a training set and a test set. Otherwise, the results will be biased and will end up with a false impression of a high-accuracy model. Hence, splitting the training and test set is the uttermost important task here.

Performed machine learning models for solar forecasting
Various ML models are employed and analysed for the seasonal periods which are explained in detail below.

Multiple linear regression
It is the most basic and often used predictive analysis kind. MLR is a prevalent regression approach that models the linear relationship between a single constant dependent variable and multiple independent variables. Figure 3 explains the structural representation of MLR. By suitable a linear equation, it aims to model the relationship between two or more characteristics and responses identical to simple linear regression in terms of steps.
Sklearn library is used in the project which contains essential tools for statistical modelling and machine learning that also includes regression. The dataset is first cleansed and split into training and test sets where 80% of data is used for training and the remaining for testing and feature scaled by the standardization method where all features lie between − 3 to + 3. The overall accuracy score of MLR without seasonal split is 0.8277. The root mean square error (RMSE) score of MLR is 118.224 and mean absolute error (MAE) is 83.6 (Table 1).

Support vector regression
SVR is a machine learning model that uses the support vector machine to predict a continuous variable. SVR employs the epsilon-insensitive tube to fit the best line within a collection of values, whereas linear regression models use the line of best fit to lessen the discrepancy between actual and predicted values. In that, the equation of the line is the same in support vector regression and linear regression, This line is referred to as a hyperplane in SVR. Support vectors, which are used to plot the boundary line, are the data points on each side of the hyperplane that is near the hyperplane. The structural representation of SVR is shown in Fig. 4, which implies that several input levels are converted into hidden layers to display the output.
The SVR seeks to fit the best line within a threshold value, as opposed to other regression models that aim to minimize the error between the real and projected value (distance between the hyperplane and boundary line). As a result, the SVR model tries to meet the requirement − ay − wx + ba. To forecast the value, it uses the locations along this boundary. The model of support vector regression is created using the support vector machine (SVM) software. A strong interface is provided by the libsvm library. With this interface, SVM implementation is rapid and simple. It also facilitates probabilistic categorization using the kernel approach. Common kernels including linear, RBF, sigmoid, and polynomial are included.
While the preceding explanations focus on linear circumstances, SVM and SVR algorithms may also handle non-linear situations using a kernel approach. A kernel is a function (selection can be from several) that takes a non-linear problem and converts it to a linear problem,  which the algorithm can then solve in a higher-dimensional space. Moreover, the RBF Kernel is famous because of its resemblance to the K-Nearest Neighbor Algorithm. Because RBF Kernel Support Vector Machines only need to stock the support vectors during training and not the complete dataset, it has the advantages of K-NN and avoids the space complexity problem. The ultimate purpose is to predict solar radiation and forecast it. Of all the kernels mentioned above, RBF abbreviated as radial basis function kernel will be more appropriate in this research work.
The overall accuracy score of SVR without seasonal split is 0.5787. The RMSE score of SVR is 147.05 and MAE is 81.73. Table 2 describes the seasonally split climatic score of SVR, their RMSE, and MAE value.
Since the support vector classifier places data points above and below the classifying hyperplane, there is no probabilistic justification observed. Hence, SVR lags in accuracy as seen in Table 2.

Decision tree regression
The decision tree approach, unlike other supervised learning algorithms, may be employed to solve regression and classification issues as well. The goal of employing a decision tree is to make a training model that can be used to predict the class or value of target variables by learning decision rules inferred from forgoing data (training data). By using tree representation, the decision tree algorithm attempts to solve the problem. Each aspect is represented by an internal node of the tree, whereas each class label is represented by a leaf node.
The process was developed from the root of the tree when predicting a class label for a record in decision trees. The values of the root attribute and the record's attribute are compared. The branch corresponds to that value and jumps to the next node based on the comparison. By comparing the attribute values of our record with the internal nodes of the tree until the leaf node, the expected class value is obtained. Figure 5 explains the systematic representation of DTR with a root node, interior node, and leaf node flow.

Information gain
To evaluate the information contained by each attribute using data gain as a criterion, some ideas from information theory were used. Entropy is a metric for determining the unpredictability or uncertainty of a random variable X. There are just two classes in a binary classification problem: positive and negative. If all of the instances are positive or negative, the entropy will be zero, indicating a low level of entropy. Entropy is one if half of the records are of positive class and half are of negative class (Jeremiah Lutes, 2020).
The information gain of each characteristic can be calculated by calculating its entropy measure. The projected reduction in entropy due to sorting on the attribute is estimated using information gain. It is possible to calculate the amount of information gained. The standard deviation for the two attributes is The drop in standard deviation after a dataset is split on an attribute is used to calculate the standard deviation reduction. Finding the attribute that yields the largest standard deviation reduction is the key to build a decision tree. Table 3 describes the seasonally split climatic score of decision tree, their RMSE, and MAE value.
Compared to SVR and multiple linear regression, decision tree bags the highest accuracy and winter's score is comparatively high when compared to that of other seasons.

Random forest regression
RFR is a versatile, user-friendly machine learning method that provides excellent results most of the time even without hyper-parameter adjustment. Simply saying, random forest constructs many decision trees and blends them to get a more accurate and consistent forecast. A random forest is an ensemble approach that works by combining several decision trees using a technique known as Bootstrap and Aggregation, which is shown in Fig. 6 sometimes known as bagging.

L1 regularization
When a regression model employs the L1 regularization approach, it is referred to as lasso regression. Ridge regression is employed when the L2 regularization approach is applied. L1 regularization adds a penalty proportional to the absolute value of the coefficient's magnitude. This kind of regularization can provide sparse models with few coefficients. Some coefficients may reach 0 and so be removed from the model. Larger penalties result in coefficient values closer to zero (ideal for producing simpler models). L2 regularization, on the other hand, does not result in the eradication of sparse models or coefficients. Figure 7 shows the approach of lasso and ridge regression.

Mathematical equation of lasso regression
Residual sum of squares + λ * (sum of the absolute value of the magnitude of coefficients)  where λ denotes the amount of shrinkage. λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model. λ = ∞ implies no feature is considered, i.e. as λ closes to infinity it eliminates more and more features.
The bias increases with an increase in λ.
Variance increases with a decrease in λ.

Ridge regression
Ridge regression is a model-tuning technique used to analyse data with multicollinearity. L2 regularization is performed via this approach. When there is a problem with multicollinearity, least squares are unbiased, and variances are enormous, resulting in projected values that are distant from the actual values.

Ridge regression's cost function
The standard regression equation serves as the foundation for every sort of regression machine learning model, and it is expressed as follows: where Y is the dependent variable, X is the independent variable, B is the regression coefficients to be calculated and e is the residual errors. When the lambda function is added to this equation, the account for the variation that the generic model does not evaluate. After the data is ready and identified to be part of L2 regularization, there are steps that one can undertake. Tables 4 and 5 describe the seasonally split climatic score of lasso and ridge regression.

Gradient boosting regression
One of the most effective methods in machine learning is the gradient boosting technique. The machine learning algorithm mistakes are broadly categorized into two types: bias errors and variance errors. As one of the boosting strategies, gradient boosting is used to decrease the model's bias error. The gradient boost algorithm's base estimator is fixed, i.e. Decision Stump. They may tweak the n estimator of the (8) Min(||Y − X(θ)|| 2 + λ||(θ)|| 2 (9) Y = XB + e gradient boosting method, for example, using AdaBoost. However, if they do not specify a number for n estimator, the default value for this algorithm is 100.
The gradient boosting approach may be used to forecast not just continuous target variables (as a regressor), but also categorical target variables (as a classifier). When used as a regressor, the cost function is mean square error (MSE), and when used as a classifier, the cost function is Log loss.
Gradient boosting is made up of three parts: The loss function that has to be improved. Making predictions is a difficult task for a slow learner. To reduce the loss function, an additive model is used to incorporate weak learners.

Improvement in gradient boosting
As gradient boosting is a greedy method, it can easily overfit a training dataset. Regularization strategies that punish various sections of the algorithm and in overall enhance the algorithm's performance by decreasing overfitting might help it.
Tree constraints Shrinkage Random sampling Penalized learning Figure 8 shows the graph of the gradient boosting loss function.
A limitation on the improvement of any split added to a tree is the minimum improvement to loss. Other than random forest, another important ensemble method used in our project is gradient boosting algorithm. Gradient boosting regressor is imported from the ensemble package of sklearn and the model is split as 80% data to the training set and 20% to the testing set. Table 6 describes the seasonally split climatic score of gradient boosting algorithm.
The data was noticed in the approaching Fig. 9 where the coefficients of lasso and ridge are more or less the same, that is the reason seen in the accuracy table getting similar scores for both lasso and ridge regression.
The result of gradient boosting has high accuracy when compared to the other previous regressors. Not only specifying the overall data, but also the data that is seasonally split. Compared to all other regression algorithms employed before gradient boosting regression, one of the ensemble methods results in higher accuracy and less root mean square value and mean absolute error value. The overall accuracy score of GBR without seasonal split is 0.8967.

Long-short-term memory
The vanishing gradient problem in backpropagation is mostly solved using LSTM (short for long-short-term memory). The memorizing process in LSTMs is controlled by a gating mechanism. Gates that open and close allow information to be saved, written or read in LSTMs. These gates store memory in analog format and perform element-wise multiplication using sigmoid ranges ranging from 0 to 1. Analog is excellent for backpropagation because it is differentiable. Let us have a look at how an LSTM is built (Malakar et al. 2021). Figure 9 shows the function of sigmoid and tanh of LSTM.
The activation function tanh is non-linear. It controls the values that go over the network, keeping them between − 1 and 1. To prevent information fading, a function with a longer second derivative is required. The sigmoid function is a non-linear activation function. The gate keeps it contained. Sigmoid, unlike tanh, keeps the values between 0 and 1. It assists the network in updating or erasing data.
If the multiplication yields a zero, the data is considered lost. Similarly, if the value is 1, the information remains. Figure 10 depicts the LSTM cell with the following gates: forget, input, cell, and output gates.

Forget gate
The forget gate determines which data need attention and which can be overlooked. The sigmoid function passes information from the current input X(t) and the hidden state h(t-1). Sigmoid generates values ranging from 0 to 1. It determines if a portion of the previous output is required (by giving the output closer to 1). The cell will utilize this value of f(t) for point-by-point multiplication later.

Input gate
To update the cell status, the input gate performs the following processes. The second sigmoid function receives the current state X(t) and the previously hidden state h(t-1). The values are changed from 0 to 1 (important) (not important).

Cell gate
The forget gate and input gate have given the network enough information. The information from the new state must then be decoded and stored in the cell state. The forget vector f multiplies the prior cell state C(t-1) (t). If the result is 0, the values in the cell state will be dropped.

Output gate
The value of the next hidden state is determined by the output gate. This state stores data from earlier inputs. The current state and previous concealed state values are first sent to the third sigmoid function. The tanh function is then used to construct a new cell state from the old cell state. These   two outputs are multiplied one by one. The network selects which information the concealed state should convey based on the final value. Prediction is based on this concealed state. The new cell state and hidden state are then passed forward to the next time step. At last, the forget gate then chooses which information from earlier stages is necessary and proceeds further. This comprises the technical flow of LSTM methodology. In this research, 2016-2020 data is set as a training set and 2021 data is set as a testing set. The feature scaling of this algorithm involves normalization rather than standardization. Normalization normalizes the features from − 1 to + 1 which will be better processed by the model for predicting the solar irradiance. Min Max Scaler is used to preprocess here. From Keras library, sequential, LSTM, and dropout models are imported.
RNN is initialized by the sequential method. Adding the first LSTM layer with 50 units and regularizing the dropout as 0.2 using the dropout method with return sequence true.
By zeroing off the neuron values at each training step, a randomly blackout of a fraction of a layer's neurons can be done. The dropout rate is the percentage of neurons that are cancelled out. The values of the leftover neurons are multiplied such that the total sum of the neuron values remains constant.
Adding the second LSTM layer with 50 units and regularizing the dropout as 0.2 using the dropout method with return sequence true.
Adding the third LSTM layer with 50 units and regularizing the dropout as 0.2 using the dropout method with return sequence true.
Adding the fourth LSTM layer with 50 units and regularizing the dropout as 0.2 using the dropout method with no return sequence.
Since this fourth layer is the final layer of the LSTM model, so it is not needed to return the sequence. Adding output layer using dense method with units equal to 1. Optimizing the output layer is also the most important part in evaluating and validating the LSTM model. Compiling the LSTM model with Adam optimizer and mean squared error loss function. Adam is the best optimizer among the adaptive optimizers in most cases. Fitting the regressor to the training set with 100 epochs and a batch size of 32 results. At the end of the 100th iteration, the loss value results in 0.0040; this is much better predicted solar radiation.

Results and discussion
Different machine learning algorithms are performed in this work. The data that is used is overall 2016-2020 data and also split based on climatic conditions like summer, winter, monsoon, and autumn.

Machine learning results
The comparison chart of different algorithms on various seasons of the considered data is shown below in Table 7.
From the comparison table, it is inferred that the ensemble approach to forecasting solar irradiance bag's high accuracy in contrast with other machine learning algorithms has been shown. Under seasonal split, GBR goes well with the period of March to May and random forest goes well with the rest of the periods between June and Feb. And for time series forecasting of solar irradiance, deep architecture method of recurrent neural network particularly LSTM is deployed with less error approximation value. The detailed pictorial performance representation of each algorithm's results is shown below. Figure 11 depicts the performance model of solar radiation forecasting, especially in the winter season including the months of November, December, January, and February. From this graph, it is inferred that the decision tree method and random forest go well with the actual data, and compared with a decision tree, the random forest has higher accuracy of 95.62% in cold climates. Figure 12 includes the summer months between March and May, and from this representation, it is inferred that lasso and random go down and the decision tree and XGBoost go well with the actual value in which XGBoost scores 97.2%. Figures 13 and 14 depict the autumn and  Figure 15 shows the results of fitting the regressor to the training set with 100 epochs and a batch size of 32. Unlike other methodologies, the approach that we proceeded for LSTM technique involves three layers of training with multiple units. At the conclusion of each layer, the accuracy showed a noticeable improvement. At the final layer, it could be observed that the loss value is 0.0040 at the end of the 100th iteration which is significantly better than the expected solar radiation. Using the matplotlib library package, actual and predicted solar irradiance trained by the LSTM method, particularly for a year, is plotted which is shown below. The LSTM model is trained with the 2016-2020 year's data first of which as seen above, got the loss as 0.0040 and then tested with the 2021 year's data. The graph is plotted between 2021's date and solar irradiance GHI. It is viewed from Fig. 16 that the output of LSTM model has higher accuracy than the previous methodologies. LSTM model for mixed and cloudy climates also has less error approximation (Yu et al. 2019).

Implementation and deployment
The overall machine and deep learning algorithms for different seasons and different periods were considered and implemented in a user-interface (UI) web-based application. Django rest framework is used for building UI and it is one of the easy tools to integrate machine learning models with web UI. The Django REST framework is a solid and adjustable platform for creating Web APIs. Reasons to use the REST framework are:   Excellent community support and extensive documentation. Mozilla, Red Hat, Heroku, and Eventbrite are just a few of the well-known enterprises that use and trust it. Firstly, the trained and validated model is converted into a.sav file which is done by using the python package "joblib". Joblib is a collection of Python tools for implementing lightweight pipelining. In particular, transparent disk-caching of functions and slow re-evaluation (memoize pattern) make parallel computing straightforward. Joblib can help any data structure or machine learning model alive. Joblib can pick up Python objects and filenames, making it a better substitute for Python's standard library. Pickle is replaced by joblib. dump() and joblib. load() to work efficiently on arbitrary Python objects storing huge data, particularly large numpy arrays. The.sav file which is generated by joblib is saved in the Django project directory and worked on for a further procedure. The template used for the project is built using HTML, CSS, Bootstrap and Javascript where a form is used to input the machine learning model's features. Once the user entered the UI, a request will be prompted for access location to get the user's latitude and longitude. navigator.geolocation.getCurrentPosition(position) Based on the selection of period by the user and with the user's latitude and longitude, the future's temperature, pressure, wind direction, wind speed, surface albedo, cloud type, and dew point for the location can be predicted by using Open Weather Map API. With the JSON response, the input parameters are accessed and auto-filled in the UI form. With the user's latitude and longitude, the solar zenith angle is calculated using the abovementioned formula and auto-filled. Once the user hits on the submit button, the input parameters will be passed as an array to the backend and predicted ML results are displayed in the UI as shown in Fig. 17.

Conclusion
The comprehensive approach of most of the regression algorithm's mechanisms and techniques for solar irradiance forecasting is performed in this work. There is a lot of machine learning-based solar radiation forecasting in the literature as mentioned in the above section. The easier dataset approach of splitting the data based on climatic conditions helps us to increase the accuracy and stand out. This includes multiple linear regression, support vector machine, decision tree, lasso and ridge regression, random forest, and XGBoost methods. Not only regression algorithms, but this research work also involves recurrent neural networks especially long-short-term memory which is used as time-series solar irradiance forecasting model.
Additionally, a unique approach to the features includes the accuracy and preciseness of the forecasting prototypes such as seasonal split and periods under consideration in forecasting horizons. Additionally, the present project depicts that the LSTM methodology has higher accuracy than traditional machine learning models. Several important conclusions from the performed and trained models are also listed below.
The seasonal split plays a major role in the precision and accuracy of the forecasting results. Say for example, in harsh climatic conditions, certain forecasting models may have worse accuracy, while others may perform better. As a result, one of the most important variables in improving the prognostic accuracy of solar irradiance forecasting models is weather categorization, which necessitates the addition of weather types during forecasting. Different periods under consideration have a noteworthy impact on the precision of a model's prediction outcomes. The implementation of a solar irradiance forecasting model frequently degrades as the forecasting horizon lengthens. As a result, prediction models should be chosen based on the forecasting horizon. For forecasting solar irradiance, there exist numerous deep learning algorithms. Some are more commonly used (LSTM, RNN) which are computationally expensive while also being extremely accurate. Few are less expensive to compute like MLR, SVR, decision tree, lasso, and ridge but they have slow convergence and lesser efficiency. To summarize, LSTM is best for solving complicated timeseries forecasting problems and should be further studied.
The comparative analysis of various ML algorithms and deep structured learning LSTM models shown in the effort can improve the upcoming solar energy researchers, planners, and predicting specialists by allowing them to choose the best model that will help them improve the performance of forecasting models.
From the analysis, LSTM has the least error approximation of 0.0040 loss at the 100 th epoch, and of all ML models, gradient boosting and random forest top high; when it comes to the summer season, gradient boosting leads 2% than random forest; and similarly for winter, autumn and monsoon season, random forest has 1% higher accuracy score that gradient boosting. These high accuracy models are deployed in a user interface for easy access by the users that will be more useful for real-time solar prediction, load operators for maintenance scheduling, stock commitment, and load dispatch centres for engineers to decide on setting up solar power plants, for household clients and future researchers.