Deep learning-based estimation of rice yield using RGB image

Crop productivity is poorly assessed globally. Here, we provide a deep learning-based approach for estimating rice yield using RGB images. During ripening stage and at harvest, over 22,000 digital images were captured vertically downwards over the rice canopy from a distance of 0.8 to 0.9 m, and rice yields were obtained in the corresponding area ranging from 0.1 and 16.1 t ha −1 . A convolutional neural network (CNN) applied to these data at harvest predicted 70% variation in rice yield with a relative root mean square error (rRMSE) of 0.22. Images obtained during the ripening stage can also be used to forecast the nal rice yield. Our work suggests that this low-cost, hands-on, and rapid approach can provide a breakthrough solution to assess the impact of productivity-enhancing interventions and identify elds where these are needed to sustainably increase crop production.


Introduction
The global demand for staple crop products is expected to increase by 60% by 2050, mainly because of the increased population, per capita income growth, and use of biofuels 1 . To meet this estimated future demand, crop production must be enhanced in an environmentally sustainable manner in the context of increasing competition for water, land, and labour, and under potentially more extreme weather conditions associated with climate change 2 . As conversion of carbon-rich and biodiverse natural ecosystems to cropland causes greenhouse gas emissions and further climate change, it is necessary to make effective use of the existing cropland to further increase production through sustainable intensi cation to increase yield and reduce yield gap, while reducing negative environmental impacts 3,4 . Furthermore, agriculture needs to address problems of poverty, poor food, and nutrition security for smallholder farmers. Despite the importance of these goals in agriculture, crop productivity is poorly assessed, especially in the global South, where there is need to monitor agricultural productivity and evaluate the impact of productivityenhancing interventions 5 . There are three well-known approaches for assessing crop yield, which include self-reporting, crop cutting, and remote sensing technologies. However, self-reported data from smallholder farmers are often inaccurate 9 . Crop cut, wherein a sub-section of a plot is physically harvested, is time-and labour-consuming, and di cult to scale to large areas with nancial limitations.
Remote sensing technologies require expensive instruments such as satellites, unmanned aerial vehicles (UAVs), and specialised sensors in many cases, which makes them di cult for practical use in the global South. The absence of reliable data on agriculture statistics is a serious constraint for both agricultural research and policy.
With recent advancement in computational technology, ground-based images captured by low-cost devices together with so called "machine learning" approaches have received great interest. Machine learning technology is one of the most remarkable innovations in the last decade 7,8 . Deep learning is categorised as supervised machine learning and mainly consists of convolutional neural networks (CNNs). A remarkable feature of CNN is its capability for image analysis. It has already been applied in various situations, which include language translation 9 , protein structure prediction 10 , board games 11 , and agriculture. To develop a practical CNN model, a large-scale combination of images and supervising data is required. The desirable target objects or crop characteristics could be those that are relatively easy to be visually evaluated for massive data collection. For these reasons, many earlier studies applying CNNs to agriculture focused on the classi cation of crop biotic 12,13,14 and abiotic stresses 15 , and estimation of crop growth-related traits such as biomass 16,17,18,19 , leaf area index 20 , grain number 21 , and panicle density 22,23 , which could help indirectly predict crop yield through the use of crop simulation models and their empirical relationships with yield. However, to the best of our knowledge, no study has directly estimated crop yield using deep learning with ground-based images.
This study focuses on rice, which is by far the most important in terms of human consumption in lowand lower-middle income countries among the big three cereals and is mainly cultivated by smallholder farmers 24 . We established a database of ground-based digital images of rice taken during the ripening stage and at harvest, and the corresponding yields were collected from seven countries using a standardised data collection procedure. We then developed a CNN model that covered a wide range of yield levels, rice growing environments, cultivars, and crop management practices, such as crop establishment methods and fertiliser management. We assessed the robustness of the model under various conditions which potentially affected the yield estimation. We demonstrate that rice yield can be rapidly and effectively estimated at a low cost without involving labour-intensive crop cuts or expensive remote-sensing technologies at harvest and during the ripening stage with satisfactory accuracy.

Database on rice canopy image and grain yield
The multinational dataset of rice canopy image and corresponding rough and lled grain yields, and aboveground dry weight was established with a standardised data collection procedure for 4820 harvested plots and 22067 images in various on-station and on-farm eld experiments and farmers' elds or seed production plots across 20 locations in seven countries (Fig. 1a, Supplementary Fig S1, Supplementary Table S1, S3). The database includes 415 plots from on-farm elds accounting for 9% of the total plots. Cote d'Ivoire, Senegal, and Japan accounted for 56%, 32%, and 5% of total data points, respectively (Fig. 1b). The dataset covers both lowland and upland rice production systems containing 462 rice cultivars, and include two crop establishment methods (direct seeding and transplanting) (Supplementary Table S2). N-P-K fertiliser application ranged from 0 to 200 kg N ha −1 , 0 to 120 kg P 2 O 5 ha-1, and 0 to 120 K 2 O kg ha −1 , respectively (Supplementary Table S1). The observed rough grain yield ranged from 0.1 to 16.1 t ha −1 with an average of 5.8 t ha −1 and showed a normal distribution (Fig. 1a). Hence, our dataset covers a wide range of yield levels, crop management, cultivars, and growing environments of rice (Supplementary Table S1). We found strong and positive relationships between rough grain yield, aboveground dry weight, and lled grain yield ( Supplementary Fig. S2). Further data analyses using the CNN model focused only on rough grain yield. The main part of the dataset was split into three parts: (i) development and evaluation consisting of training (72% of harvested plots in this development and evaluation), validation (14%), and test (14%); (ii) robustness; and (iii) prediction (Fig. 1c). The prediction dataset consisted of data collected in Moshi, Tanzania, and in Tokyo, Japan. This implies that the prediction accuracy of the developed CNN model was evaluated at the "unknown" and "independent" dataset in this study. Furthermore, among the ve cultivars grown in these locations, one cultivar in Tanzania (cv. TXD 306) was not included in any other dataset.
A CNN model to estimate rough grain yield from canopy image The CNN structure used in the present study has ve convolutional layers with one fully connected layer in the main stream with three branching layers (Supplementary Fig S3). The learning rate and batch size during the learning process were optimised with 10 replications. The combination of learning rate and batch size of 0.0001 and 32, respectively, resulted in the best performance for the test dataset (Supplementary Figure S4). With this combination, the best model of the learning process was generated at epoch = 61, and the model was used for all of the following analyses (Fig. 2a). The developed CNN model could explain approximately 70% of the variation in yield for validation and test data, respectively, with a relative root mean square error (rRMSE) of approximately 0.22 for both ( Fig. 2b-c). The relationship between the observed and estimated yields t well to the 1:1 line for both datasets. The deviation between the estimated and observed yields of individual cultivars in the test dataset was plotted against the number of harvested plots in the training dataset (Fig. 2d). The cultivars with more than 25 plots in the training dataset tended to have less than 1 t ha −1 deviation. The empirical relationships illustrated as upper and lower boundary curves in Fig. 2d indicate that increasing the number of data points by 10 times can reduce the error of the yield estimation by 50%.
The accuracy of the CNN model was further evaluated using the prediction dataset. The model estimated the rough grain yield with an R 2 of 0.487 and rRMSE of 0.174 across ve cultivars in two countries (Fig. 3a). It underestimated the yield of the cv. Koshihikari; the deviation between the estimated and observed yield was higher in this cultivar than in the other cultivars. However, the model successfully estimated the yield variation observed in the cv. TXD 306, which was included solely in the prediction dataset. To determine the number of images that should be used for proper yield prediction per plot, rough grain yield was predicted using different numbers of replicated images (1-5) per harvested plot ( Fig. 3b), and averaged across images per harvest plot. There were no apparent differences in R 2 and rRMSE between the observed and predicted yields using different numbers of images, with R 2 of 0.469 to 0.491 and rRMSE of 0.175 to 0.180. When comparisons were made among the predicted yields using different numbers of images, they were strongly and positively correlated.
To understand how the CNN model reads the images and estimates rice yield, we used the occlusionbased visualisation technique to estimate the additive effect on yield estimation 25 . Brie y, the speci c part of the image was masked by a grey square, and the yield estimation of the masked image was subtracted from that of the original image. The calculated values can be interpreted as the additive effect of the masked region on the yield estimation and mapped to the original image with a colour scale ( Supplementary Fig S5). This analysis revealed that the regions containing many rice panicles have a positive effect, whereas the region with leaves, stems, or ground has a negative effect on yield estimation. The importance of the panicles for yield estimation was further validated using panicle removal experiments conducted in Kyoto, Japan. The two panicles per hill were sequentially removed from the canopy, and the rough grain weight and canopy images were recorded for each sequence ( Fig. 4a-b). The yield was estimated using the CNN model for each sequence of panicle removal. The heat map analysis con rmed that the regions containing many panicles had a positive effect on yield estimation in the initial rice canopy. However, these regions diminished as the panicles were gradually removed (Fig. 4c). However, when panicles were removed, the regions with overlapping or senescent leaves in the lower position tended to have positive effects (Fig. 4d). The estimated yield for the canopy with no panicles was 1.60 t ha −1 , which implied that apart from the existence of panicles, information on the background canopy may have also been utilised for yield estimation.

Robustness of the developed CNN model
The robustness of the CNN model to image quality was tested using the images taken (i) from different shooting angles, (ii) at various times of day during the ve days before harvest, and (iii) on different shooting dates during the ripening stage. The shooting angle assumes human error, while the time of day re ects the changing natural environment causing the variation of the contrast or colour balance of the image. The shooting date is important to assess when rice yield can be effectively predicted by using our model during the ripening stage.
To determine the range of shooting angles acceptable for the developed CNN model, we estimated rice yield using images acquired from eight shooting angles (in 10° increments from 20° to 90° (control)) in Mbe, Côte d'Ivoire (Fig. 5a). The deviation between the estimated and observed yields was averaged across 25 harvested plots at each angle. The deviation ranged from -3.7 to 2.4 t ha −1 when the depression angle was 20° (Fig. 5b). The deviation decreased with an increase in the depression angle.
When the outlier was excluded, the ranges of the deviation were between -0.45 and 2.44 t ha −1 at 60°, which was comparable with that at 90° (control). The heat map analysis with images taken at a shallower angle showed that the regions having an inner structure of the canopy, such as stems or leaves in the lower position tended to have a signi cant negative effect on the yield estimation (Fig. 5c).
Furthermore, the regions with overlap between the leaves and panicles in images at shallower angles, such as 20° to 50° did not have a positive effect like the image in the control angle (e.g., left bottom parts in the 20° image, and upper parts in the 50° image). The estimation accuracy analysis showed that greater depression angles resulted in better estimation accuracy (Fig. 5d). When the depression angle was greater than 60°, the R 2 and rRMSE calculated between the estimated and observed yields ranged from 0.435 to 0.493 and 0.180 to 0.219, respectively. Strong correlations were found among the estimated yields from 70°, 80°, and 90°, with R 2 greater than 0.76 and rRMSE of less than 0.11. The image of the rice canopy was captured by a xed-point camera every 30 min for 5 successive days before and at harvest in Kyoto, Japan (Fig. 6a). The images for every 2 h on 29 August 2020 are shown as an example of a clear sunny day (Fig. 6c). The image taken at 0600 hrs has a different colour balance compared to the others because of the lower irradiation. The images taken at 0800 hrs, 1400 hrs, and 1600 hrs have higher contrast because of the shallower angle of solar radiation. The images taken at 1000 hrs and 1200 hrs were bright and had lower contrast because of the greater angle and intensity of the solar radiation. Despite such variation in light environments, the CNN model provided stable outputs throughout the daytime with a slight overestimation (Fig. 6b). The heat map analysis revealed that the CNN model showed stable recognition of the panicles regardless of the image quality (Fig. 6d), which led to a robust estimation of yield.
To assess from when the CNN model can predict rice yield during ripening stage, the canopy image was taken once a week after 50% heading until the harvest for 22 cultivars in Mbe, Cote d'Ivoire. The yield estimated in the early ripening stage tended to have a lower yield than the observed yield at harvest, whereas such a trend was not observed with the yield estimated at the later ripening stage (Fig. 7a). This indicates that the model recognises mature panicles (Fig. 4, Supplementary Fig S5) but not the immature panicles. When the data from 22 cultivars was pooled, the ratio of the estimated yield to the observed yield ranged from 0.3 to 0.6 at just after 50% heading, and the y-intercept of the segmented regression was 0.517. The ratio increased linearly during ripening. The relationship reached a plateau at approximately 4 weeks after 50% heading (WAH) (Fig. 7b). A similar trend was also observed in Madagascar ( Supplementary Fig. S6), while the relative yield plateaued within 2 WAH. The R 2 values between the estimated yield during 2 to 4 WAH and the observed yield ranged from 0.370 to 0.410, whereas it was 0.572 at harvest (Fig. 7c). The rRMSE between the estimated yield after 3 WAH and the observed yield ranged from 0.193 to 0.196. When comparisons were made among the estimations after 3 WAH, the R 2 and rRMSE ranged from 0.657 to 0.767 and 0.135 to 0.228, respectively. The CNN model slightly underestimated the yield at 3 WAH compared with the observed yield, whereas it slightly overestimated the yield at harvest (Fig. 7d). The correction of the estimated yield by using the empirical relationship observed in Fig. 7b was conducted to reduce the deviation, especially in the earlier ripening stage. When the corrected estimation at 2 WAH was compared with the observed yield, R 2 and rRMSE improved to 0.381 and 0.196, respectively (Fig S7).

Discussion
Multinational database of canopy image at harvest and rough grain yield collected using the standardised data collection procedure in a wide range of rice growing conditions (Fig. 1, Supplementary  Fig S1, S2) contained more than 22,000 images, and had large variation in rice yield. This dataset enabled the development of a CNN model that can estimate rice yield under a wide range of conditions (Fig. 2a, Supplementary Fig S3, S4). No other studies have developed a model to predict rice yield accurately only by using RGB images captured with a commercially available digital camera. The results from our analysis using the prediction dataset, which are independent of the others and include the unique cultivar cv. TXD 306, clearly demonstrate that our CNN model is capable of estimating rice grain yield.
It is repeatedly reported that satellite data alone or in combination with other data and models can estimate crop growth-related traits such as aboveground biomass and leaf area index, and indirectly predict crop yield in farmers' elds 26, 27,28,29 . UAVs were proposed as a powerful tool for estimating the aboveground biomass by utilising various sensors 30,31 . The accuracy of estimation directly using rice canopy images in the present study is comparable to or even higher than those shown in earlier studies. The accuracy of our model was achieved without any expensive equipment. Furthermore, the accuracy was evaluated using an independent prediction dataset, which has rarely been tested in earlier studies.
Our model was able to estimate rice yield with satisfactory prediction in the existing most comprehensive dataset in terms of the growing environments, camera settings, and number of cultivars. The object detection algorithm based on CNN enabled the detection of rice 22 and wheat 23 panicles, and it can be a potential approach for indirect yield estimation. However, it is well known that other yield components interact with panicles and strongly affect rice yield 32 . Unless the models for predicting other yield components are not developed, the model for detecting panicles would not be useful for yield estimation.
However, the unknown conditions causing the poor estimations of the CNN model should always be assumed when considering the scale and diversity of rice cropping systems globally. For instance, the dataset does not include the canopy affected by severe lodging, pests, insects, weeds, or abiotic stresses such as heat, drought, and ooding. Most of the data points are from irrigated lowland rice elds with relatively higher yields, and data from farmers' elds are limited. Thus, further data collection is required, especially for low-yielding and rainfed environments, and assessment of the potential use of the model for stressed or injured rice plants is warranted. The most practical solution to adopting the model to these new conditions would be to add these new data to the database and develop a new model. The results in Fig. 2d suggest that better accuracy can be achieved with more data points. As a criterion, approximately 25 harvesting plots are needed for adaptation to new conditions with practical accuracy, which should be validated for developing a sampling framework for improving and adapting the model to new conditions. The occlusion-based method for visualising the distribution of the additive effect on the yield estimation clearly indicates that the CNN model autonomously learned the contribution of panicles on yield only by the relationships between input canopy images and the observed yield (Fig. 4c, Fig. 5c, 6d, and Supplementary Fig S5). However, the CNN model predicted a positive value of yield for canopies with no panicles in the panicle removal experiment (Fig. 4d). Similarly, the model estimated the positive value of the yield for the images taken at around 50% heading date when the panicles were immature (Fig. 7b,  Supplementary Fig S6). Although the canopy used in the panicle cut experiment is unrealistic as it has substantial biomass without panicles at harvest (Fig. 4b), these results imply that the model could have also utilised the information on background canopy, such as the amount of leaves, planting density, or stem size for yield estimation.
The robustness of the CNN model to image quality is crucial because the image is not necessarily acquired under optimal rice growing conditions. Based on our assessment of the robustness of the model, the results suggest that (i) the model can be applied to the depression angles of the camera from 60° to 120° (Fig. 5), (ii) the model output is slightly affected by the changing light intensity without any reference board or colour checker (Fig. 6b), and (iii) forecasting the yield prior to the harvest is possible using the model and images acquired at 3 WAH or later. Three WAH corresponded to approximately 10 to 20 days before harvest (Supplementary Fig S8). We also found that a single image per plot was su cient for a proper estimation of yield. These results clearly show that the CNN model offers great advantages for application in eld conditions. Particularly, yield forecasting has great potential bene ts in terms of eld management, marketing, distribution, and policy decisions. By correcting the output of the CNN model based on the relationship shown in Fig. 7b, the yield may be forecasted even earlier than 3 WAH (Supplementary Fig S7). However, this relationship seems to be different across growing conditions and set of cultivars, and the ratio of estimated to observed yield saturated earlier in Madagascar than in Cote d'Ivoire (Fig. 7b, Supplementary Fig S6). The reason for such differences between the two locations is not known, although it may be a combined effect of various factors such as cultivar-speci c dynamics of grain-lling, growing environment, soil fertility, and water management, and therefore, further studies are warranted. Additionally, the robustness of the CNN model can be evaluated at different distances from the canopy, which further enhances the applicability of the model in the future.
The CNN structure used in this study has several convolutional layers (Supplementary Fig S3), and is much smaller than the representative structures for image recognition 33 . This implies that the developed model can be easily transferred to mobile devices such as smartphones. The model does not require any type of colour checker. It can accept the depression angle of the image from 60° to 120° at any time of the day, at 3 WAH or later, for shooting the canopy image. The exibility and robustness of the developed model provide a breakthrough solution for non-destructive, rapid, and on-site evaluation of rice productivity, which enables the assessment of the impact of productivity-enhancing interventions and identifying elds where these are needed to sustainably increase crop production.

Methods
Construction of database for rice canopy image and rough grain yield.
Field campaigns were conducted in 2019 and 2020 at 20 locations in seven countries (Côte d'Ivoire, Senegal, Japan, Kenya, Madagascar, Nigeria, and Tanzania). Data on rice growth traits and digital images were collected in seed production plots as well as experimental elds at research stations and farmers' elds (Supplementary Table S1). At maturity, the RGB images were captured vertically downwards over the rice canopy from a distance of 0.8 to 0.9 m using a digital camera (Fig S1a). The digital cameras used in this study are listed in Table S1. Five images were taken per harvesting plot by slightly shifting the camera for image augmentation. The rice canopy images cover approximately 1 m 2 , which correspond to the harvesting area proposed by Food and Agriculture Organisation (FAO) and used by Japan for agricultural statistics 34 . Rough grain yield that contained lled and un lled grains was measured at the corresponding plot or larger plots, where yield data were collected based on eld experiments (Supplementary Table S1). Rice yields were reported as 14% moisture. The aboveground total dry weight and lled grain weight were also recorded in most studies. Rice yield level, rice production system, rice variety, and key crop management practices are shown in Supplementary Table S1. The database consists of eight categories, as presented in Fig 1c. For most of the training, validation, and test data, we used only a single image per plot. These three categories are the main part of the database and randomly split by a ratio of approximately 72:14:14. After splitting the data, the images were augmented for 4-fold by ipping horizontally, vertically, and their combination, which resulted in 17764 images for training data. For panicle removal, angle, shooting date (see the following sections), and prediction data, we used ve replicated images per plot. The prediction data consisted of the dataset collected at Moshi (3.45S, 37.38E), Tanzania, and at Tokyo (35.41N, 139.29E), Japan, where the data were not included in any other categories. For the time-of-day data, the sequential shooting of the canopy images was conducted using a xed camera. In total, 4820 yield data and 22067 images of 462 rice cultivars were used in this study (Figure 1c, Supplementary Table S2).

Panicle removal, and experiments for robustness evaluation
The panicle removal experiment was conducted at Kyoto (35.2N, 135.47E) and Tsukuba (36.03N, 140.04E), Japan. The ve replicated canopy images were acquired for the plot to be harvested. Two panicles per hill at the random position of the canopy was removed, and then 5 images were acquired. The grain weight from the collected panicles were measured separately. By repeating this process until all the panicles were removed from the harvesting plot, the series of images with gradually decreased panicle number and the corresponding yield were obtained. The dataset at Tsukuba was included for the training, validation, and test data, and the dataset at Kyoto was used to evaluate the impact of canopy removal on the yield estimation.
The angle changing experiment was conducted at M'bé (7.87N, 5.11W), Cote d'Ivoire. The curved rail with a diameter of 1.8 m was xed above the canopy to be harvested. By shifting the position of the camera on the rail, the image from the various depression angles were shot with the constant centre of the image. The depression angles were set to 20, 30, 40, 50, 60, 70, 80, and 90 (control) degrees. The data for angle changing experiment was collected for 25 harvested plots. The day time experiment was conducted at Kyoto, Japan. HykeCam SP2 (Hyke Inc., Japan) was xed above the canopy of cv. Koshihikari and Takanari. The canopy images were automatically recorded every 30 min 5 days before the date of harvest for Koshihikari, and 11 days prior to harvest for Takanari. After nishing the record, the plot was harvested by the common protocol with other experiments. The data of Takanari was used for the model development and the data of Koshihikari was used for the time-of-day analysis.
The shooting date experiment was conducted at M'bé, Cote d'Ivoire and Marovoay, Madagascar. At M'bé, the 22 cultivars grown in 34 plots in total were used. The canopy images of these plots were acquired once a week from 1 to 4 weeks after 50% heading, 2 days and 1 day before harvest, and at harvest. Only the images taken on 2 days and 1 day before harvest were used for model development, while the others were used for the shooting date analysis. After the nal image records, the rice plants were harvested using a common protocol. At Marovoay, the canopy images of seven plots were recorded from 2 days prior to 14 days after 50% heading. Six images were taken every 10 min from 1200 to 1250 hrs and were used for the shooting date analysis.
Image processing and development of convolutional neural network model The RGB images of the rice canopy were recorded with an aspect ratio of 4:3 or 16:9. For the images recorded at 16:9, the edge of the long side was trimmed to a ratio of 4:3. The images were then resized to 450 × 600 pixels for recording in the database, and again resized to a square of 512 × 512 pixels in 8-bit PNG format as inputs for the CNN model.  Fig S3) was then deployed using Python language (version 3.7) with Pytorch framework (version 1.7). The loss function and optimizer were de ned by the mean absolute error and Adam optimizer, respectively. The optimal learning rate and batch size were determined by changing the combination of these hyper-parameters. Batch sizes of 16, 32, 64, 128, and learning rates of 0.0001, 0.0002, 0.0005, 0.0008, and 0.001 were combined, and the learning process was replicated 10 times for each combination. The epoch number was set to 100, and the learning process was conducted by minimising the loss of estimated and observed yields in the training dataset. The validation loss was also calculated for every epoch, and the model showing the least loss for validation was recorded. The rRMSE for the test dataset was calculated for models with all combinations of the hyper-parameters, and averaged across 10 replications. The best combination of batch size and learning rate was determined, and the recorded model was used in the present study.
Occlusion-based method to quantify the additive effect on the yield estimation The occlusion-based method 25 Fig. S5a, b). Each portion of the original image was covered by one of the images in a series of 300 images with a grey square. Then, the rough grain yield was estimated using the CNN model, and the subtraction against the estimation for the original image was calculated. These values overlapped with the original image as a heat map (Supplementary Fig S5c).
Statistical analyses, data summarizing, and code availability The 4820 observations of rough grain yield data were summarised by calculating the average, maximum, and minimum yields. The data were categorised according to the collected country, and the average yield in each country was calculated. The R 2 and rRMSE were calculated to evaluate the model performance in each analysis. The rRMSE is de ned as follows: where , is the average of the observed yield, n is the size of the data, and and yi are the individual estimations and observations of the yield. The rough grain yield for panicle removal, angle, shooting date, and prediction dataset was estimated with ve replicated images per harvested plot, and then averaged.
The standard error of the ve replicated estimations was calculated in the panicle removal experiment.
For the changing angle experiment, the rst, second, and third quartiles were calculated for the deviation between the estimated and observed yields across 25 plots and displayed with their average, maximum, and minimum values as the box plot. For the day time experiment, the estimated yield for every 30 min was averaged across successive 6 days, and the standard error was calculated. Segmented linear regression was adopted to determine the relationship between days after 50% heading and the relative yield observed in the shooting date experiment. For the data collected at M'bé, Cote d'Ivoire.
and for the data collected at Marovoay, Madagascar; were used, respectively. The parameters a and b are constant, y is the ratio between the observed and the nal yield, and x is the date after 50% heading. The parameters c 1 and c 2 are the breaking points of the segments, and Eq. (3) represents the 3 segmented regression. Function 'I' is the step function, which is de ned as follows: For the dataset in Madagascar for the shooting date experiment, the six estimations from 1200 to 1250 hrs were averaged and de ned as an estimation for a plot. The estimations at seven harvested plots were then averaged, and the standard error was calculated. All analyses in the present study were conducted using Microsoft Excel (Microsoft, Redmond, WA, USA), Neural Network Console software (Sony Network Communications Inc., Japan), and Python language version 3.7 (http://www.python.org) with Pytorch framework version 1.7 (https://pytorch.org/). The code to run the developed CNN model is available at https://github.com/r1wtn/rice_yield_CNN.git.

Data Availability
The data that support the ndings of this study are available from the authors on reasonable request. Figure 1 The       1) (c-d) Images recorded at 0600, 0800, 1000, 1200, 1400, and 1600 hrs on Aug. 29, 2020 (c) and the corresponding heat maps of the additive effect on the yield estimation (d).

Figure 7
The applicability of the model to the images taken during grain lling stage. (a) Examples of the image taken at approximately 1 to 4 weeks after 50% heading (cv. IRRI 154). (b) Scatter plot of the estimated yield relative to the nal yield plotted against days after 50% heading. The data consists of images of 22