Forecasting Photosynthetic Photon Flux Density Under Cloud Effects: Novel Predictive Model Using Convolutional Neural Network Integrated With Long Short-term Memory Network


 Forecast models of solar radiation incorporating cloud effects are useful tools to evaluate the impact of stochastic behaviour of cloud movement, real-time integration of photovoltaic energy in power grids, skin cancer and eye disease risk minimisation through solar ultraviolet (UV) index prediction and bio-photosynthetic processes through the modelling of solar photosynthetic photon flux density (PPFD). This research has developed deep learning hybrid model (i.e., CNN-LSTM) to factor in role of cloud effects integrating the merits of convolutional neural networks with long short-term memory networks to forecast near real-time (i.e., 5-minute) PPFD in a sub-tropical region Queensland, Australia. The prescribed CLSTM model is trained with real-time sky images that depict stochastic cloud movements captured through a Total Sky Imager (TSI-440) utilising advanced sky image segmentation to reveal cloud chromatic features into their statistical values, and to purposely factor in the cloud variation to optimise the CLSTM model. The model, with its competing algorithms (i.e., CNN, LSTM, deep neural network, extreme learning machine and multivariate adaptive regression spline), are trained with 17 distinct cloud cover inputs considering the chromaticity of red, blue, thin, and opaque cloud statistics, supplemented by solar zenith angle (SZA) to predict short-term PPFD. The models developed with cloud inputs yield accurate results, outperforming the SZA-based models while the best testing performance is recorded by the objective method (i.e., CLSTM) tested over a 7-day measurement period. Specifically, CLSTM yields a testing performance with correlation coefficient r = 0.92, root mean square error RMSE = 210.31 μ mol of photons m-2 s-1, mean absolute error MAE = 150.24 μ mol of photons m-2 s-1, including a relative error of RRMSE = 24.92% MAPE = 38.01%, and Nash Sutcliffe’s coefficient ENS = 0.85, and Legate & McCabe’s Index LM = 0.68 using cloud cover in addition to the SZA as an input. The study shows the importance of cloud inclusion in forecasting solar radiation and evaluating the risk with practical implications in monitoring solar energy, greenhouses and high-value agricultural operations affected by stochastic behaviour of clouds. Additional methodological refinements such as retraining the CLSTM model for hourly and seasonal time scales may aid in the promotion of agricultural crop farming and environmental risk evaluation applications such as predicting the solar UV index and direct normal solar irradiance for renewable energy monitoring systems.

model both direct and diffuse photosynthetic-active radiation and output biomass for a range of 105 ecological and agricultural applications have also been developed [21]. 106 In respect to solar energy, monitoring or integration into electricity grids, intermittencies in 107 power production are highly driven by cloud variations [40]. However, the ability to develop reliable 108 models to predict short-term (e.g., 5-10 minute) solar radiation can provide a future solar system real-109 time monitoring capability to resolve clean energy challenges by better capturing cloud cover, 110 lifetime, spread or stochastic movements. Also, the option to capture cloud cover variations in a solar 111 ultraviolet index (UV Index) model such as the one developed previously by Deo et al.,[41] can help 112 in skin cancer and eye disease risk mitigation. Developing a PPFD prediction model trained with 113 cloud images may provide useful insights into UV index, solar power production or energy demand 114 monitoring. 115 In a previous study, the near real-time PPFD prediction model of Deo et al. [39] was based 116 on an adaptive neuro-fuzzy inference system to predict PPFD over 5-minute horizons in Queensland 117 (Australia), using time lagged SZA data under cloud-free conditions. Utilising the local solar zenith 118 angle (SZA) as the only input variable, they demonstrated good accuracy in predicting the real-time 119 PPFD with changes in SZA for 5 minute and hourly forecasts. Such studies that model real-time solar 120 photosynthetic energy can play a pivotal role in helping explore regional development of the 121 agricultural sector. However, the inclusion of cloud cover (which is vital for the control of plant 122 growth, was not considered in previous studies. The development of an AI-based model to predict 123 the influence of cloud variations at near real-time, and how the cloud properties (derived from image 124 chromic information) might control the amount of ground-based photosynthetic-active radiation is 125 yet to be explored. 126 This paper develops an artificial intelligence (AI)-approach that considers the total sky 127 conditions, addressing the role of cloud cover variations to accurately model PPFD at 5-minute time 128 scales. The contribution and novelty are to build a first deep learning AI method for real-time PPFD 129 forecasting, capturing the influence of cloud properties on measured photosynthetic-active radiation.

130
A deep learning-based methodology utilising whole sky image characteristics of both the cloud and 131 cloud-free conditions typical to local farming environments incorporates data features from high 132 temporal resolution images such as those captured by Total Sky Imager (TSI) or geo-stationary 133 satellites e.g., Himawari 8 or 9 providing inter-minute level sky images. The objectives are as follows.

134
(1) To process TSI-based cloud images corresponding to PPFD measured at 5-minute intervals 135 through a custom-built cloud segmentation algorithm [42] applied to each image, and produce 136 descriptive statistics based on the blue, red, thin and opaque cloud chromatic features (i.e., means, 137 standard deviations, differences, ratios). These are then used to build an optimal set of model inputs (ELM and MARS) methods are described elsewhere [43][44][45][46][47][48] and Solar Ultraviolet Radiation Laboratory, a quality-controlled monitoring station measured PPFD 220 and weather conditions since 2011 (Fig. 2b). Located at an elevation of 690 m ASL, Toowoomba is a 221 regional city with a high solar energy potential and is also classified as a regional centre for 222 agricultural activities that makes the PPFD forecast models an advantageous tool for practical 223 applications in agricultural sectors. The specific study site also has a relatively large number of full 224 sunshine days and a clear hemispheric view of the solar horizon [77] that also makes it an ideal site 225 to implement the CLSTM model for real-time forecasting of photosynthetic-active radiation.

<Fig 2(a-d)>
To build the proposed CLSTM predictive model, high-quality, yet cloud-influenced 228 measurements of PPFD were acquired over the austral summer solstice period (01 to 31 Mar 2013).

229
The data were collected using a Quantum sensor (LI-190R; LI-COR, Lincoln, USA) connected to a 230 CR100 Campbell Scientific data logger (Logan, USA) (Fig. 2)  is different for different days or times. This is perhaps due to cloud cover or atmospheric conditions 242 (e.g., ozone, aerosols, water vapor). Fig. 3 further considered in this study, assuming everything captured by the user threshold to be thin cloud.

268
The TSI Analyser was applied to a 1-month dataset with 5-minute interval cloud images 269 considering over 200,000 images collected at a 480 × 320 spatial resolution. These whole-sky images 270 have been captured using TSI440 [84][85][86] used in previous research (e.g., [82,87,88] green-blue (RGB) format for further image processing. produced the average of the whole sky blue (Bav), whole sky red (Rav), as well as the statistical features 280 based on standard deviation, ratios, or differences of the blue (B) and red (R) pixel values for clouds 281 that represent the estimated proportion of pixelized cloud features likely to be a function of the 282 photosynthetic-active radiation received at a measuring sensor. To analyse the degree of associations 283 between cloud movement and an instantly measured PPFD value, a cross correlation analysis is 284 performed to determine the covariance measured by rcross prior to developing the proposed CLSTM 285 model. Table 1 includes the rcross used to determine the order of our model input combinations, 286 presented in Table 2. It is evident that the average of whole sky-blue pixel in a total sky image appears To corroborate the findings in Table 1 we now inspect visually the covariance in cloud chromatic 295 properties against measured photosynthetic-active radiation. Figure 4 displays a scatterplot of the 296 cloud cover statistics as well as SZA data that are regressed against the measured PPFD in the model 297 training phase. The whole sky-blue average is seen to attain the highest coefficient of determination 298 (r 2 = 0.549) with respect to the PPFD values. The other significant predictor variables are found to 299 be the blue cloud pixel standard deviation (r 2 = 0.403), solar zenith angle (r 2 = 0.403) and the standard 300 deviation of the whole sky-blue (r 2 = 0.365). It is especially notable that the ratio of red to blue sky 301 and the difference between the blue and red pixels in a whole sky image appears to be weakly 302 correlated with PPFD data series, and therefore, may not contribute significantly towards improving 303 the proposed CLSTM model. Taken together, the present analyses clearly ascertain that at least two 304 of the cloud chromatic properties (i.e., whole sky blue & blue cloud pixel averages associated with 305 measured PPFD) are more strongly correlated with PPFD, compared with the solar zenith angle used 306 in earlier studies. This deduction confirms that the inclusion of cloud cover properties may be a crucial 307 task used to improve earlier models for photosynthetic-active radiation (e.g., [39]). GHz processor running on 32GB memory. Figure 6 illustrates the model development stage and As this study's intent is to build a forecast model that can accurately predict the 355 photosynthetic-active radiation at a future timescale over near real-time (5-minute) intervals, we have 356 further explored the cross correlation between cloud chromatic properties and photosynthetic-active 357 radiation (or PPFD) using a time-lagged correlogram. Figure 7 identifies the covariance between 358 PPDF (i.e., target) and SZA, along with all of the other cloud-image derived predictor variable data 359 in the model training phase. Evidently, the lagged series show a strong (±) serial correlation exceeding 360 the statistically significant region at the 95% confidence which is indicated by a blue line.

361
Interestingly, the correlation coefficient in terms of the time-shifted cloud properties for non-zero lag 362 (i.e., occurring for an input that was regressed on a target at a different timescale) is also prominent 363 for some of the inputs (e.g., thick clouds, average of red pixel values in the cloud cover, difference 364 between whole sky red and the blue pixels, and the ratio of red to the blue pixel values in the clouds).

365
This indicates a strong non-linear association between cloud chromatic properties and photosynthetic-

413
The CNN model's hyperparameters were also optimised that included the following options.

414
 the process including a forward and backward deletion process to reach the optimal MARS equation.

446
In the forward phase, a 'naïve' model with just the intercept term is used with iterative addition of     Table 2) and a reference model average, standard deviation of the blue pixels, blue cloud average pixels, standard deviation of the 492 blue cloud pixels, opaque cloud pixels, standard deviation of the red cloud pixels, whole sky red 493 average pixels, and the SZA time series yielded the most accurate performance. For the case of the 494 LSTM model (Fig. 8b), the best performance is attained through M13:   Interestingly, the best performance among all tested models is attained by different input 529 combinations that use both the cloud cover properties and the solar zenith angle as an input variable.   reference models are clustered much further away from the axis representing the observed PPFD 566 whose RMS-centred difference certainly separates them away from the cloud cover-based models, and secondly, the CLSTM model utilising cloud properties (indicated in red) is at a closest location 568 to the observed PPFD, and also attains the highest correlation among all tested predictive models. It 569 is also observable that all the cloud cover-based models are within a smaller cluster (and hence, 570 demonstrate comparable performance) whereas those utilising SZA only are more scattered. This 571 suggests that the inclusion of cloud cover is necessary to optimise all the DL and ML models, but 572 among all these models, the CLSTM remains the superior choice to forecast the 5-minute PPDF 573 dataset. such Himawari 8 or 9, operating at roughly 10-minute interval and relatively high spatial resolutions 667 may become good suppliers of sky images to be used as inputs for the CLSTM model to generate 668 predicted PPFD or other components of solar radiation at appropriate temporal resolutions.
Other than agricultural applications, our CLSTM model incorporating cloud conditions also 670 has potential use in public health and energy sectors. In an earlier study, Deo et al.,[88]  relative to a comparable performance for cloud cover-based models (Fig. 10) the need for ground-based inputs that are data expensive for many regional locations can be 734 eliminated. Furthermore, fish-eye lens or adapters used in mobile phones may also be able to supply in both remote and regional locations is a necessary step to help in direct harnessing of solar energy, 746 biofuels from microalgae, agricultural crop monitoring and supporting bio-physical sectors where 747 photosynthetic-active radiation needs to be monitored. Figure 1 Schematic illustration of Convolutional Neural Network-Long Short-Term Memory Network (CLSTM) predictive framework. CNN used for feature extraction from solar zenith angle (SZA) and cloud chromatic properties from Total Sky Imager (TSI) and LSTM is used for time sequential modelling of the photosynthetic-active radiation (represented as photosynthetic photon ux density, PPFD).    Scatterplot-based correlation analysis of the 5-minute PPFD (i.e., the objective variable) in respect to the 17 cloud-image derived predictor variables used in training the proposed CSLTM model. Least square regression lines with the coe cient of determination (r2) is included for each sub-panel with the de nition of each cloud-image derived predictor variable as per in Table 1.  Schematic diagram of the relevant steps in designing the CLSTM predictive model.

Figure 7
Correlograms plotted to identify the degree of covariance between PPFD (i.e., the objective variable) and the 17 different cloud-image derived predictor variables within the CLSTM model's training phase The yaxis shows cross-correlation coe cient, rcross with blue line representing the level at the 95% con dence interval.

Figure 8
Scatterplots of forecasted against observed PPFD values (µ mol of photons m-2s-1) emulated by the CLSTM model in the testing phase, compared with benchmark models. Only the optimal results (out of all designated models, M1 to M17) for each predictive algorithm based on best input combinations utilising cloud chromatic statistics and SZA as predictors, as per Table 2, are shown.

Figure 9
The percentage frequency of the forecasted error generated by the CLSTM model against the deep learning (i.e., LSTM, CNN, DNN) and machine learning (ELM, MARS)-based models developed using best input combinations utilising cloud chromatic statistics and SZA as the predictors, in accordance with Table 2.

Figure 10
Taylor diagram with a concise statistical summary of how well the simulations from the CLSTM predictive model match with the other models in terms of their correlations between observed and forecasted PPFD, root-mean-square difference and the ratio of the variance in testing phase. Only the most optimal model with cloud cover properties (i.e., M¬8, M13, M12, M5, M11 and M10) and without cloud properties (i.e., M18 trained with SZA as input variable) are shown.

Figure 11
Boxplot of the absolute forecasted error in PPFD: |FE| = |PPFDifor -PPFDiobs| within the testing phase using the cloud cover-based and the SZA only reference models. Figure legend should also indicate what the line, box, whiskers and points represent.

Figure 12
Empirical cumulative distribution function (ECDF) of the PPFD forecasting error |FE| in the testing phase.

Figure 13
The effect of cloud cover properties used as inputs for the CLSTM model with 5-minute forecasted PPFD averaged over the entire testing dataset from 07.00 AM to 05.00 PM.