Cotton yield estimation model based on cloud computing

Timely and precise yield estimation is of great significance to agricultural management and macro-policy formulation. In order to improve the accuracy and applicability of cotton yield estimation model, this paper proposes a new method called SENP (Seedling Emergence and Number of Peaches) based on Amazon Web Services (AWS). Firstly, using the high-resolution visible light data obtained by the Unmanned Aerial Vehicle (UAV), the spatial position of each cotton seedling in the region was extracted by U-Net model of deep learning. Sec-ondly, Sentinel-2 data were used in analyzing the correlation between the multi-temporal Normalized Difference Vegetation Index (NDVI) and the actual yield, so as to determine the weighting factor of NDVI in each period in the model. Subsequently, to determine the number of bolls, the growth state of cotton was graded. Finally, combined with cotton boll weight, boll opening rate and other information, the cotton yield in the experimental area was estimated by SENP model, and the precision was verified according to the measured data of yield. The experimental results reveal that the U-Net model can effectively extract the information of cotton seedlings from the background with high accuracy. And the precision rate, recall rate and F1 value reached 93.88%, 97.87% and 95.83% respectively. NDVI based on time series can accurately reflect the growth state of cotton, so as to obtain the predicted boll number of every cotton, which greatly improves the accuracy and universality of the yield estimation model. The determination coefficient (R2) of the yield estimation model reached 0.92, indicating that using SENP model for cotton yield estimation is an effective method. This study also proved that the potential and advantage of combining the AWS platform with SENP, due to its powerful cloud computing capacity, especially for deep learning, time-series crop monitoring and large scale yield estimation. This research can provide the reference information for cotton yield estimation and cloud computing platform application.


I. INTRODUCTION
Crop yield estimation exerts a vital part in formulating economic policies, and is an important factor affecting regional economic development, ensuring food security and maintaining sustainable agricultural development [1]. Cotton is one of the main crops in China. It is exceedingly beneficial to farmers and government to cognize cotton's growth and yield, because they can implement corresponding management and formulate policies in advance, so as to obtain better economic and environmental benefits [2].
For a long time, yield estimation has been a research hotspot in agricultural science [3][4][5]. With the development of science and technology, the research on cotton yield estimation has been developed from traditional ground survey to multi-dimensional and spatio-temporal remote sensing estimation. Yeom proposed an automatic open cotton boll detection algorithm using ultra-fine spatial resolution UAV images [6]. Using NOAA/AVHRR satellite data with high time resolution, Dalezios established NDVI based on time series to estimate cotton yield [7]. By integrating the concept of cotton growing area with similarity analysis of time-series NDVI data, Gao proposed a method of cotton yield estimation [8]. In a word, cotton estimates which based on time series is an effective method, but how to improve the accuracy of the estimated model is a challenging issue, yet to be adequately resolved. Besides, remote sensing image data based on time series requires great computing power and the conventional methods are not conducive to the rapid application and promotion of the estimation model.
With the continuous innovation and wide application of computer technology [9][10][11][12][13] and cloud computing [14][15][16][17], agricultural information service has a new idea. Recently, quite a few cloud computation platforms for geospatial data processing have become available with big data-processing tools and high-performance computational power [18], including Google Earth Engine (GEE), Amazon Web Service (AWS) and National Aeronautics and Space Administration (NASA) Earth Exchange (NEX) [19]. They possess plentiful imagery archives and data products, and also can be easily carried out for thematic mapping as well as spatiotemporal analyses, with the support of parallel-processing computation and advanced machine learning algorithms [20]. The advent of cloud computation platforms has altered the way of storing, managing, processing and analyzing of massive amounts of large-scale geospatial data [21]. Zhang investigated the potential and advantages of the freely accessible Landsat 8 Operational Land Imager (OLI) imagery archive and GEE for exact tidal flats mapping [22]. By using GEE, Venkatappa determined the threshold values of vegetation types to classify land use categories in Cambodia through the analysis of phenological behaviors and the development of a robust phenology-based threshold classification (PBTC) method for the mapping and long-term monitoring of land cover changes [23].
The explicit goal of this research is to propose a new cotton yield estimation model with the help of cloud computing platform to accurately draw cotton yield estimation map. The research results can provide technical ideas for more convenient, accurate and widely used cotton yield estimation.

A. Study area
In this paper, Shihezi reclamation area of the 8th division of Xinjiang production and construction corps in China was selected as the study area. It is located between latitudes 44 ° 29 ′ 36 ″ and 44 ° 29 ′ 55 ″ North and longitudes 86 ° 01 ′ 00 ″ and 86 ° 01 ′ 50″ East. The total area of the study area is about 637.08 acres, as shown in the figure 1. Xinjiang has unique ecological and climatic conditions, even continuous farmland, and standard farmland construction. The mechanization and scale of cotton planting are relatively high, making it the most suitable area for remote sensing yield estimation and precision agriculture in China [24].

B. Datasets
UAV data. The UAV data are obtained by Dapeng cw-10 UAV equipped with sensors of Canon camera EF-M18-55, which is mainly used for the extraction of cotton seedlings. The data is a visible remote sensing image of the UAV taken at 11 am on May 23, 2018, with a resolution of about 2.5cm. At the time of data collection, the weather was good and there was no wind. The UAV has a flight height of 150m with a longitudinal overlap of 80% and a side overlap of 60%. Visible light data obtained by UAV are calibrated and corrected by Pix4D software, and the whole workflow is automatically accomplished by the software.
Sentinel-2 data.Sentinel-2 data are mainly obtained from AWS, which is used in monitoring the growth of cotton in multi-time. The satellite carries a multispectral imager (MSI), with an altitude of 786km, covering 13 spectral bands and a width of 290km. The ground resolution is 10m, 20m and 60m respectively. Sentinel-2 data are the only data that contains three bands in the range of the red edge, which is exceedingly effective for monitoring vegetation health [25].
Ground measured data. The measured data on the ground are mainly used in calculating the process data and verify the results. In order to record the position and boundary of ground measured data in detail and precisely match with UAV data, the experiment found a total of 60 evenly distributed sample areas in the research area, including 40 experimental sample areas and 20 verification sample areas. We inserted a rod in the center of each sample area and placed a red disk at the top of the rod. The size of the sample area is 3×3m. Therefore, it is indispensable to find the position of each rod in the image, and extend 1.5m up, down left and right respectively based on the center of the rod, so as to obtain the position and vector boundary of the sample area. Experiments demonstrate that the accuracy of the ground data collected by this method is higher than that of other positioning methods, such as handheld GPS.

C. Yield estimation
By using UAV data, exact information of cotton seedling emergence can be obtained to grasp the spatial position and quantity of seedlings in the region. Using Sentinel-2 data, the growth state of cotton can be monitored in multitime to estimate the boll number. Based on the above results, the estimated yield of per cotton can be acquired. Therefore, this study proposed a cotton yield estimation model and method based on SENP (Seedling Emergence and Number of Peaches) with this notion, which provides a technical method for realizing more precise cotton yield estimation.

Fig. 2: Technology roadmap of SENP
Cotton seedlings extraction. Deep learning has the characteristics that can extract the image features automatically to make precise classification and recognition decisions [26]. Therefore, aiming at the high resolution remote sensing data of UAV, this experiment uses the Fully Convolutional Networks (FCN) to extract the seedlings of cotton. This network is frequently used in processing remote sensing images and has achieved favorable results [27][28]. We input the remote sensing image of cotton seedlings into the cloud computing platform. Subsequently, we use the U-Net model which was stored in the cloud computing platform and trained in advance to calculate the input image. Finally, the extracted results are converted into point element classes, which are stored in the cloud platform for later loading into the Sentinel-2 data.

Fig. 3: Structure chart of U-Net
Cotton growth monitoring. The growth state of cotton in each growth period will affect the formation of yield. Accordingly it has potential advantages to construct the multi-temporal remote sensing yield estimation model to estimate the yield of cotton by comparison with the single time. Firstly, the NDVI data of multiple periods are calculated by using Sentinel-2 data. Secondly, correlation analysis was conducted between the calculated results of all NDVI and the measured results in the sample area, and the weight of each period of data in the production estimation was obtained according to the size and proportion of the correlation coefficient. According to the weight, a comprehensive NDVI (CNDVI) can be calculated to evaluate the growth state of cotton during the whole growth process. Finally, the predicted boll number of per cotton can be obtained by fitting the measured average peach number in the experimental sample with CNDVI. (2) Where NIR and R represent Near Infrared band and Red band respectively and a represents the weight of NDVI in different periods.

R NIR
Cotton yield estimation model. The definition of SENP is formularized as: In the formula, SENP is the predicted total output of cotton in a certain region. And n represents the total number of cotton seedlings in the region. C represents cotton seedlings of different spatial positions. Y represents the predicted yield of per cotton at the corresponding position. T represents the rate of boll opening. N represents the number of bolls. W represents the weight of each boll. And j represents the number of sample areas. B represents the number of boll that has opened. A represents the total number of bolls in sample area. L is a scaling factor. Generally, the services provided by cloud computing can be divided into three layers. These three layers are Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [29]. The first layer is Infrastructure, and the second layer is Platform, and the third layer is Application. Infrastructure services include virtual or physical computers, storage in block, and network infrastructure (such as load balancing, content delivery networks, DNS resolution) [30]. The service of the platform includes object storage, authentication service and access service, runtime of various programs, queue service, database service and so on [31]. The service of application software has many projects, such as mail service, code hosting service and so on. Users can access and use these services through desktop computers, laptops, mobile phones, tablets and other Internet terminal devices. Amazon's cloud service provides dozens of services [32], including IaaS, PaaS and SaaS.
In 2006, AWS began offering IT infrastructure Services to enterprises as Web Services, now commonly referred to as cloud computing. One of the main advantages of cloud computing is the ability to supersede upfront capital infrastructure costs with lower variable costs [33]. Instead of planning and purchasing servers and other IT infrastructure weeks in advance, companies can run hundreds of servers in minutes and get results faster by using cloud computing platform [34]. In 2018, AWS launched 1957 new services and features, delivering innovation at an unmatched pace, especially in new areas such as machine learning and artificial intelligence. At present, Amazon Web Services provide a highly reliable, extensible and low-cost infrastructure platform in the cloud, offering support to hundreds of enterprises in 190 countries and regions, making it the most comprehensive and widely used cloud platform in the world [35].

ESE. ENVI Services Engine (ESE)
is an enterprise server product advanced by Exelis VIS. ESE provides ENVI, IDL, SARscape and other remote sensing image processing capabilities as services to support online, on-demand remote sensing image applications [36]. It breaks down the barrier of professional remote sensing software and high-end hardware for non-professionals and establishes more direct contact between remote sensing experts and prospective end users. ESE can be deployed in a variety of enterprise-level environments, including cluster environment, enterprise-level server or cloud platform [37][38], etc., making full use of high-performance server hardware conditions to efficiently accomplish the remote sensing image processing of the large amount of data.
ESE is established on top of mainstream REST frameworks and can run in clustered environments, with scalability and load balancing capabilities. ESE gets HTTP and REST requests from the client-side, where ESE performs remote-sensing relevant processing requests, and thereafter passes the results to the application. ESE's image processing function is packaged with JSON standard and can be seamlessly integrated with image data services provided by other middleware (such as ArcGIS Server).

Fig. 6: Workflow of ESE
Cloud platform construction. The experiment mainly used AWS and ESE to establish a cloud computing platform for cotton yield estimation. The back-end development of the platform mainly uses Interface Description Language (IDL) to customize applications, such as the calculation of NDVI, the classification of growth monitoring and the calculation of SENP model. While the front-end development of the platform mainly uses JavaScript to create custom Web applications, including the loading of maps, presentation of yield results and so on. The experiments used Amazon's Elastic Compute Cloud (EC2) and Simple Storage Service (S3) Cloud services. EC2 is a Web service that provides scalable cloud computing capabilities and is designed to provide developers with easy access to network-scale computing [39]. S3 is an internet-oriented storage service that can store and retrieve data anywhere on the Web at any time. At the same time, AWS in the global region and AWS in the Chinese region are used respectively. AWS in the global region is mainly used for downloading Sentinel-2 data, calculating NDVI and storing data, and after that passing the results to AWS in the Chinese region. While AWS in the Chinese region mainly calculates UAV data and multi-temporal NDVI data, and uses SENP model to estimate cotton yield. The final consequence can be viewed in real time via the Web on a computer, tablet or mobile phone.

E. Accuracy Assessment
In order to strictly verify the reliability of production estimation model and the feasibility of constructing cloud platform, the precision evaluation is carried out by rigorous standards. Three indexes, Precision, Recall and F1were used to evaluate the precision of cotton seedling emergence. For cotton yield estimation, Coefficient of Determination (R 2 ) and Root Mean Square Error (RMSE) were selected to evaluate the results.

Recall Precision
Recall Where TP is the number of cotton seedlings correctly extracted. FP is the number of cotton seedlings wrongly extracted, and FN is the number of cotton seedlings not extracted. N is the number of samples. and represent predicted yield and actual yield. and are the average of predicted and measured yields respectively.

A. Seedling emergence and extraction results
The emergence of cotton seedlings is a key link in the construction of the SENP model, which will affect the final estimation results to a large extent. Consequently, the methods and results of cotton extraction are crucial. At present, most scholars use spectral information to calculate some vegetation indices of crops for extracting, and most of them have achieved some favorable results [40][41][42]. But the research on the extraction of cotton seedlings is still infrequent. In this paper, the high-resolution data obtained by UAV were used and the spatial information of each cotton seedling in the region was extracted by deep learning. The study area was about 637.08 mu, and a total of 4,364,255 cotton seedlings were extracted in the end. The density of the cotton was about 6,850 per mu. Verified by the measured data, it can be seen that the accuracy of this method is extremely high. The precision rate is 93.88%, and the recall rate is 97.87%, and the F 1 value is 95.83%. Accordingly, the experimental results manifest that U-Net model can effectively extract emergence information of cotton seedling. It is a valid method, which can not only provide supports for the construction of SENP, but also provide a new idea for extraction of cotton seedling.

B. Growth monitoring results
The multi-temporal growth monitoring of cotton is also another vital link in the construction of the SENP model and the vegetation growth is an extremely complex process. Since when the conditions of soil, water and chlorophyll change, it may have an impact on the final yield. So it is necessary to monitor the growth of cotton based on time series. By using Sentinel-2 data of cotton in 10 periods, the NDVI values of each period were calculated and analyzed. Furthermore, the correlation between the actual yield and 40 experimental samples was calculated to determine the weight of each period that was selected. By assigning weights to multi-temporal NDVI images, we used an image (CNDVI) to classify the growth state of cotton accurately, objectively and reasonably. According to the results of calculation, the correlation coefficients between NDVI and yield are 0.69, 0.72, 0.75, 0.81, 0.88, 0.87, 0.82, 0.83, 0.75 and 0.72 respectively. The results demonstrate that the correlation between the NDVI and the actual yield in cotton boll period is relatively large, while the correlation between bud stage and boll opening period is relatively small. Therefore, according to the size and proportion of the correlation coefficient, the weights of 0.09, 0.09, 0.10, 0.10, 0.11, 0.11, 0.10, 0.11, 0.10 and 0.09 were assigned to each period's NDVI image to construct CNDVI. The experimental results can show that the estimated results based on multi-period are higher than that based on single period. (b) Correlation linear graph of NDVI and actual output in each period

C. Output Estimation Results
When the results of cotton seedling extraction and multi-temporal growth monitoring are obtained, the yield estimation model can be constructed. In the experiment, the yield of study area can be estimated with the aid of formula (3).The total area is 637.08acresand the total output of cotton is 261,200.75 kg. Through the analysis of results, it can be seen that the cotton yield in this region is relatively high. At the same time, there is a positive correlation between the yield and the growth of cotton. Provided that the growth situation is better, the yield is higher, which also accords with the actual situation of production. The experimental results also indirectly reveal that the theory for estimating the yield by using SENP is feasible. In order to further quantitatively verify the accuracy of results, this experiment uses actual yield of 20 validation samples to carry out regression analysis on the predicted results of yield, and selects R 2 and RMSE as indicators to test the reliability of the model. If R 2 gets closer to 1, the better it fits. If R 2 is bigger and RMSE is smaller, the prediction ability will be stronger. By calculation, it can be seen that the R 2 of yield estimation has reached 0.92and the RMSE is just 6.04, indicating that the accuracy of cotton yield estimation using SENP model is extremely high.
In the past, the calculation based on deep learning or time series was extremely time consuming and requires strong capability. Nevertheless, this experiment uses AWS to give full play to its advantages, so that it takes just 22 seconds from data download to presentation of yield estimation. Moreover, the whole process is accomplished automatically. Users can view the results of cotton growth and final yield estimation by logging on the Web conveniently. So we can see that the efficiency of this platform is overwhelmingly high, which fully proves the potential and advantages of the combination between SENP and AWS.

D. Cloud Platform Display
Based on AWS and ESE, this experiment has successfully established an online cloud platform of cotton estimation. The back-end development of the platform is based on IDL, while the front-end development of the platform is mainly based on JavaScript. When users log on the Web through the Internet, they can not only obtain cotton yield estimation, but also realize numerous functions, such as searching and managing information of land, transmitting and viewing sensor information, obtaining meteorological data, generating results report and so on.

IV. Discussion
Prediction of cotton yield is a complicated work, which not only requires considering the practicability and feasibility of the technology, but also requires considering the credibility and accuracy of the prediction results. This paper proposes a new cotton yield estimation model named SENP. The application of this method in accurate yield estimation has certain reference value, but the potential factors that may affect the results of yield estimation still need further exploration and research.
(1) UAV is an effective way to gain high-resolution data with a lot of superiorities. But the capability of data acquisition is relatively influenced by some factors. If search further explores cotton yield estimation methods in larger areas in the future, there may be some limitations on data acquisition.
(2)The experiment used NDVI data of 10 periods to monitor and analyze the growth of cotton, and achieved some favorable results. However, how to choose the best time of cotton monitoring has not been studied systematically. And in the future, we will try to use more data at different times to monitor cotton growth so as to explore the possibility of improving the accuracy of the model.
(3) Cloud computing has many advantages and the platform based on AWS and ESE can efficiently and rapidly calculate the results of yield estimation. In the later stage, it can attempt to further optimize the display of interface and graphics processing algorithm to improve the calculation speed and increase the quality of user experience.

V. Conclusions
Taking full advantage of cloud computing, this paper presents a new cotton yield estimation model based on AWS, which can provide a new notion for innovative application of cloud computing platform and research of cotton yield estimation. The main conclusions of this research are as follows: (1) For high-resolution data of UAV, U-Net model can effectively extract the information of cotton seeding emergence, accurately obtain the spatial position of each cotton seedling and calculate the total number of cotton in the region.
(2)Exact monitoring results of cotton growth are conducive to the establishment of model. Using NDVI data of cotton in a certain period to evaluate its state is not representative and precise. While using time series data of NDVI is a better way to monitor the growth of cotton.
(3) The experimental results demonstrate that it is feasible to use the information of emergence and growth of cotton to estimate yield. Verified by actual yield, the cotton yield estimation model based on SENP was confirmed to be reliable with high accuracy.
(4) Giving full play to the advantages of cloud computing, an online cotton yield estimation platform based on AWS and ESE was established, which can provide reference information for regional agricultural management and macro decision-making. It has played an active role in boosting the process of precision agriculture in China.

Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.