Sugarcane Bud Detection Method Based onYOLOv3- CSE Network

It is specified in agronomic requirements of sugarcane sowing that sugarcane buds should be placed toward the walls on both sides of the sowing ditch, while the traditional detection model of small sugarcane bud targets cannot meet the requirements of intelligent directional seeding machine for sugarcane bud identification due to such shortcomings as low accuracy, low recognition speed, and low training speed. To this end, a network model targeting sugarcane buds, called YOLOv3-CSE was proposed in this paper. On the basis of analyzing the advantages and disadvantages of the YOLOv3 network, the original YOLOv3 network was improved to achieve accurate and rapid identification of small and medium-sized targets in sugarcane buds. Besides, to further enhance the detection ability of the model for small object regions such as sugarcane buds, the original DarkNet-53 network structure and the complete intersection over union (CIoU) bounding box regression loss function were improved to make the real box regression more stable, thus avoiding IoU divergence in the training process and ameliorating the regression effect on sugarcane bud identification. Mosaic data augmentation method was applied to enrich the data diversity, so as to solve the inadequate generalization ability during small dataset training. Finally, SE-ResNet module was embedded to increase the ability of network model to identify sugarcane bud features. The test results of the YOLOv3-CSE network and the original YOLOv3 network indicated that the precision and mean average precision (mAP) of the YOLOv3-CSE network were 96.93% and 95.87%, which were 5.66% and 4.95%, respectively, higher than those of the original YOLOv3 network. Compared with other object detection models with the same dataset, the YOLO v3-CSE network proposed in this paper boasts stronger robustness, better instantaneity, higher precision and higher detection velocity in identifying small objects of sugarcane buds. In addition, it can rapidly identify the sugarcane buds, providing a technical guarantee for the application of the intelligent directional seeding machine for sugarcane seeds.


Introduction
Sugarcane, one of the main economic crops in the world, has a planting area of about 300 million mu worldwide and up to 22 million mu in China, ranking third in the globe. Sugarcane planting is one of the most important links of production,and its quality directly affects the yield of the year. It is specified in agronomic requirements that during planting, sugarcane seeds should be laid flat, with the buds facing bilateral walls of the seed ditch,and the sugarcane buds shall not be placed towards the bottom of the seed ditch, so as to promote early germination and emergence of sugarcane buds and improve the germination rate of sugarcane seeds 1 . The existing sugarcane planting machines in China and foreign countries utilize the blind planting method, lacking sugarcane bud identification function. Besides,they cannot meet the agronomic requirements for planting that the sugarcane seeds should be laid flat and the sugarcane buds should be placed toward the walls of the seed ditch 2 , so the problems including late germination, slow rooting and low germination rate of the sugarcane seeds may arise,directly influencing the sugarcane yield. Currently, the directional planting of sugarcane seeds is mainly realized by identifying the sugarcane buds via human eyes, manually determining the direction of sugarcane buds and placing the sugarcane seeds, and such factors as high labor intensity and low work efficiency seriously hinder the development of the sugarcane industry. However, the detection and recognition of the sugarcane seeds at home and abroad primarily focus on the stem node identification at present 3,4 , aiming to realize automatic sugarcane seed cutting, but there is no related research report on the mechanized directional planting of sugarcane seeds. Significant breakthroughs have been made in deep learning technology for object detection in recent years 5 . There are three major categories of object detection algorithms, namely, multi-stage algorithms including typical algorithm Cascade-regions with convolutional neural network features RCNN 6 , two-stage algorithms main typical algorithms: Faster RCNN 7 , Mask RCNN 8 , etc. and one-stage algorithms including typical algorithms YOLO Series 9-13 , Retina Net 14 and SDD 15 . Among them, multi-stage and two-stage algorithms are sparse prediction models, while one-stage algorithms belong to dense prediction models. Deep learning, characterized by automatic feature extraction, can greatly improve the efficiency and precision of object detection 16 , thereby promoting the wide application of object detection in agriculture. For this purpose 17 . established a CNN classification model to distinguish good buds and bad buds on sugarcane seeds by reference to the LeNet-5 network structure 18 . simulated different light conditions through dataset expansion and identified sugarcane stem nodes using four network models. It was shown that the YOLOv4 network has the best performance in identifying sugarcane stem nodes, with a detection velocity of 69f/s and precision of 95.12%. Moreover 19 . employed a support vector machine (SVM) to detect the locations of field weeds and maize seedlings by means of K-mean clustering-based image segmentation combined with multi-feature fusion method. The results indicated that the rotation invariant LBP feature combined with GGCM can produce an average precision of up to 97.50% in identifying maize seedlings and weeds 20 . The parameters (temperature, humidity, and moisture) for the healthy growth of crop sugarcane were continuously monitored by virtue of KNN clustering and SVM classifier, and the results revealed that the accuracy of the model is as high as 96%. Furthermore, based on improved YOLOv3 network for apple fruit identification under complicated orchard environment, Zhao et al.and Li et al 21,22 optimized the model by combining the residual modules in the DarkNet-53 network with the Cross Stage Partial Network (CSPNet), introducing the Spatial Pyramid Pooling (SPP) module,substituting the Soft Non-Maximum Suppression(NMS) algorithm for traditional NMS algorithm,and finally applying the joint loss function of Focal Loss and complete intersection over union (CIoU) Loss.It was uncovered that the mean average precision (mAP),F1-score and detection velocity reach 96.3%, 91.8% and 27.8f/s, respectively. modified the YOLOv3 network to increase the real-time dynamic identification efficiency of sugarcane stem nodes, while changing the size of output feature map and reducing the number of anchors by decreasing the number of residual structures constituted by intermediate convolution layers in the YOLOv3 network. The research results demonstrated that the mAP is 90.38%, and the average time consumption for identifying the sugarcane stem nodes is 28.7ms. The team has carried out research on the intelligent directional seeding machine for sugarcane seeds that meets the agronomic requirements of directional planting of sugarcane buds. However, Li Qiang et al 23 realized the and positioning of sugarcane buds based on convolutional neural networks and an improved LeNet-5 network model. It was denoted that the recognition accuracy of the sugarcane bud position is slightly lower (only 92%),and the average detection time of a single sugarcane seed image is as long as 1.2s.Besides, this method can only identify sugarcane buds under static conditions, and cannot achieve real-time dynamic detection of the intelligent directional seeding machine for sugarcane seeds Therefore, in order to meet the agronomic requirements for directional sowing of sugarcane seeds and realize intelligent directional planting of sugarcane seeds, the rapid and dynamic detection and identification of sugarcane buds are the key technical problems to be solved first. It is difficult for traditional identification methods to achieve precise,quick and real-time identification of sugarcane buds on sugarcane seeds because they have small sizes and different shapes, the sugarcane buds grow on the sugarcane stem nodes, there are off-white wax powder and leaf scar at the stem nodes,and dark black bulges exist at some sugarcane stem nodes. Hence, the stem node algorithms proposed in above literature cannot preferably solve the problem of detecting small objects of sugarcane buds, which has become a difficulty in rapid and accurate identification of sugarcane buds on sugarcane seeds. In this paper,a sugarcane bud detection method based on the YOLOv3-CSE network was proposed for sugarcane bud identification, and the network model was modified as follows: (1)The original DarkNet-53 network structure was improved to enhance the small object detection ability. (2)The bounding box regression loss function was improved to strengthen the regression effect on sugarcane bud identification. (3)The Mosaic data augmentation method was introduced to enrich the diversity of data. (4)The SE-ResNet module was embedded to increase the ability of network model to identify sugarcane bud features. It was verified through research that the method boasts strong robustness, good instantaneity, high precision and high detection velocity in terms of sugarcane bud identification, providing a key technology for realizing intelligent identification of sugarcane buds on sugarcane seeds as well as automatic directional planting.

Materials and methods
The project team conducted a study on the detection and identification of sugarcane buds with the intelligent directional planting machine for sugarcane seeds as a platform 24 . The sugarcane seed dataset used in this study comes from the National Agricultural Science and Technology Park in Guilin, Guangxi, China (25 • 06 ′ E, 110 • 31 ′ N), were selected as the subjects of this study, which were used for image acquisition required for model training and rapid sugarcane bud identification model training. The sugarcane seeds were collected between 2 pm and 6 pm on Sunday, October 24, 2021, the whole sugarcane was manually cut into double-bud segment cane seeds 25 , and the agricultural industry standard of the People's Republic of China (Seedling of sugarcane, NY/T1796-2009) 26 was strictly implemented. The length of the double-bud segment sugarcane seeds was 30-40 cm 27 and they were placed on a conveying device of experimental table after processing and screening, followed by image acquisition (Figure 1). Which was placed at 200-250 mm above the sugarcane buds to ensure that the images obtained could reflect different photographic distance, lighting conditions, and photographic angles. The image resolution was set as 3000 pixels × 3000 pixels. A total of 2,200 images were collected, including 2,036 pieces of images labeled after being screened qualified manually. Finally, all the images were saved in JPG format. Which were divided into training set (n=1,629) and test set (n=407) at the ratio of 8:2. Part of the image collection display is shown in Figure 2. The images were manually labeled by means of the LabelImg (https ://github.com/tzutalin/labelImg). The result documents were saved in XML format, and the 2/23 stored information included the path of images, width and height, number of channels, and object box location information of sugarcane buds. Finally, the images in JPG format and the documents in XML format were sorted and renamed to guarantee the matching of images and documents, and the datasets were saved in PASCAL VOC 28 format to facilitate the training and testing of the YOLOv3-CSE network.

The working principle of intelligent directional sowing of sugarcane seeds
The intelligent directional planting machine for sugarcane seeds was designed to realize the intelligent directional sowing of sugarcane seeds, and its working principle is shown in Figure 2. The machine was composed of a seed metering and conveying device, a CCD camera, a sugarcane bud direction-adjusting device, a pendulum-type retrieving and throwing mechanism and a rack. The sugarcane bud direction-adjusting device consisted of a servo motor,two-finger clamping cylinder, a lifting cylinder and other components. The working principle of the machine can be described as follows: the sugarcane seeds were transported to the designated position by the seed metering and conveying device, and then one end of the sugarcane seeds was clamped using the sugarcane bud direction-adjusting device and moved upward for a certain distance (to ensure that the sugarcane seeds were not interfered by the seed metering and conveying device during the direction adjustment process). Next, the CCD camera was triggered to capture the sugarcane seed images, the sugarcane bud position on the sugarcane seeds was identified, and the sugarcane seed steering angle was calculated. The sugarcane bud direction-adjusting device was applied to rotate the sugarcane seeds so that they were oriented at the same angle, and the sugarcane seeds were clamped by the pendulum-type retrieving and throwing mechanism for seeding and planting, so as to realize the intelligent directional sowing of the sugarcane seeds.

Automatic sugarcane planting scheme
In order to tackle the problems of intelligent identification and automatic directional planting of sugarcane buds, the technical scheme shown in Figure 3 was adopted in this study to realize the image acquisition of sugarcane seeds, rapid identification of sugarcane buds, adjustment of sugarcane bud direction and planting of sugarcane seeds. Firstly, the images of sugarcane seeds were collected using an image acquisition system, and then preprocessed and divided into datasets. Secondly, the YOLOv3 network(a typical object detection algorithm) was adopted for sugarcane bud identification,so to achieve fast and precise identification of sugarcane buds on sugarcane seeds. As for the problems of the original YOLOv3 network in sugarcane bud identification, such as missing detection of small objects of sugarcane buds, low identification speed and precision,and large weight files,the original model was improved,and the proposed improvement methods were compared and verified via data analysis. To transplant the network model into the embedded devices in later stage,the sugarcane seeds were captured and adjusted for the sugarcane bud direction through an end effector,thus meeting the premise of agronomic requirements for directional planting of sugarcane seeds.

Model modification Basic YOLOv3 network
The YOLOv3 network is an improved version of YOLO network, a type of object recognition algorithm proposed by Joseph Redmon and Ali Farhadi in 2018 11 , which is characterized by higher identification accuracy and speed than other networks. The framework of the YOLOv3 network consisted of two parts, namely, backbone feature extraction network and detectors ( Figure 4) (1) The backbone feature extraction network referred to a convolution layer extracted with the DarkNet-53 network as the backbone feature,which was applied to extract the features of object images. It was composed of a 3×3 convolution layer and 5 stages of residual structures.The numbers of residual structures in each stage were 1, 2, 8, 8 and 4, respectively, and each residual structure consisted of a 3×3 convolution layer and a residual block.Moreover, each residual block contained a 1×1 convolution layer, a 3×3 convolution layer and a summing operation.
(2) The detectors constituted three branches generated by the YOLOv3 network,each branch had 3 different sizes of feature maps (13×13, 26×26 and 52×52), a convolution set, a 3×3 convolution layer and a 1×1 convolution layer.Among them, the convolution set was made of continuous 1×1, 3×3, 1×1, 3×3, and 1×1 convolution layers.Through network computation,every feature map output 3 bounding boxes, each of which predicted the center coordinates (x and y), width (w), height (h), confidence and category information. In the YOLOv3 network, the DarkNet-53 network consisted of 53 convolution layers in total, which were mainly used to extract the object features.First of all, the size of the original images was adjusted to the input size,and the channel number of the feature model was increased using a scaled pyramid structure similar to the FPN network 29 and a 3×3 convolution kernel.Next, a 1×1 convolution kernel was utilized to reduce the channel number of the feature model. At last,3 feature maps with detection scales of 52×52,26×26 and 13×13 were obtained and then mutually fused, so that the model could detect objects of different sizes. For small objects such as sugarcane buds,however,missing detection of sugarcane buds and disability to meet the precision requirements remain problems even if the 52×52 feature map is used for output.

Improvement of feature layers of YOLOv3 network
The objects of sugarcane buds were identified based on the original YOLOv3 network. Specifically,in each prediction scale,3 bounding boxes were predicted by virtue of 3 anchors in each grid cell, so the YOLOv3 network was capable of identifying the input images with any resolution.According to Figure 5, if the center of the object fell into the grid,the grid would be responsible for predicting the object.On the basis of the original YOLOv3 network, the original network feature layers were modified in this study, and the superficial layer information could be better utilized to improve the detection ability of small sugarcane buds by increasing the scale of feature maps. As the target sugarcane buds were very small in the scene of experimental table,the output of feature maps with a scale of 13×13 or smaller could be ignored. In addition,the up-sampling of feature maps at the scale of 52×52 was combined with the output of 32 layers in this study,and the third new output scale (104×104) of feature maps was obtained through convolution operation. Meanwhile, the output of feature maps with a scale of 13×13 and 4 remaining residual units at the end of the original DarkNet-53 network were removed.

Improvement of bounding box regression loss function
IoU 30 , published in 2016 and widely applied since then, is an algorithm for calculating the overlap ratio of different images, which is frequently used for object detection in the field of deep learning. The calculation method is shown in Eq. (1) Where, M=(x,y,w,h) stands for the prediction box, and N=(x gt ,y gt ,w gt ,h gt ) represents the ground truth box. Besides, x and y are the abscissa and ordinate, respectively, of the center of bounding box. Moreover, w and h refer to the width and height of the bounding box, respectively. IoU is adopted in the original YOLOv3 network, which in fact is applied to measure the relative size of the overlap of two bounding boxes. In other words, a larger size of overlap between the prediction box and the ground truth box signifies a better prediction effect of the object detection network model. However, IoU has two obvious shortcomings: (1) In the case of non-intersection between the prediction box and the ground truth box, the distance between the two boxes cannot be reflected, the loss function is not differentiable, and the non-intersection cannot be optimized. At this time, there is no gradient echo, and divergence occurs, thus disabling normal training.
(2) As for the intersection between the prediction box and the ground truth box, when the IoU value remains the same, it cannot reflect the way of intersection between the two boxes. To solve the above two shortcomings, IoU was improved to CIoU in this study 31 , and the center distance, overlap rate and scale of the prediction box and the ground truth box were considered to make the regression of the ground truth box more stable and avoid IoU divergence during training, leading to faster and more accurate training convergence. CIoU loss was defined according to Eq. (2).
Where, α is the weight parameter, expressed in Eq. (3), and v is the parameter used to measure the consistency of the length-width ratio, expressed Eq. (4). Moreover, v involves the w and h to be predicted, whose partial derivatives calculated by v are shown in Eqs. (5)-(6), respectively. m and n represent the centers of the prediction box M and the ground truth box N, respectively. ρ is the Euclidean distance, and c is the diagonal length of the smallest enclosing box covering two boxes. In Figure 6, the upper left box represents the ground truth box, the lower right box stands for the prediction box, the outermost dashed box means the minimum bounding rectangle, and d refers to the Euclidean distance between the centers of the two boxes.

5/23
Where, w gt and h gt indicate the width and height of the ground truth box, respectively, and w and h are the width and height of the prediction box, respectively.
Generally, the predicted values of w and h are very low, so that the value of 1 w 2 +h 2 is very high. As a result, gradient explosion occurs. To avoid such a situation, the value is usually set as the constant 1 in CIoU loss.

Improvement of bounding box regression loss function
In terms of image identification and deep learning, the quality of datasets also affects the robustness and accuracy of the neural network model, so data augmentation for image datasets is usually necessary for enhancing the training of neural network 32,33 . Currently, photometric transformation and geometric transformation are the commonly used data augmentation methods. The former mainly focuses on the Hue, Saturation and Value (HSV) color space of pictures, especially the random adjustment of parameters V, H and S. The latter mainly targets the random scaling, rotation, translation, image occlusion and clipping of pictures. In this study, the Mosaic data augmentation method was introduced into the original YOLOv3 network, an improved version of the CutMix 34 data augmentation method, which are theoretically similar to some extent. The CutMix data augmentation method processed two pictures, that is, a small area of one picture was selected for masking, in which a small area of another picture was used to cover the corresponding small area of the first picture (Figure 7). In the Mosaic data augmentation method, four pictures were processed combined with the advantages of photometric transformation and geometric transformation. The operations in the workflow were all random, which remarkably increased the environmental complexity of pictures, helped improve the diversity of datasets and enhanced the object identification ability. The fundamental principle of Mosaic data augmentation is that an intact new picture is formed by randomly selecting, clipping and splicing four pictures of the same size from the original dataset ( Figure 8). To be specific, four pictures were randomly selected from the original dataset, and every picture was clipped at a random size. Then the first clipped picture was placed on the upper left corner, and the second, third and fourth pictures were placed on the lower left corner, lower right corner and upper right corner, respectively, after the same treatment. Finally, a picture was obtained after Mosaic augmentation, whose pixel size of the picture was identical to that of the original four pictures. The workflow is exhibited in Figure 9.

SE-ResNet module embedding
In images with complex distribution of environment, the traditional YOLOv3 network usually has phenomena such as identification error and missing detection of sugarcane buds due to the unbalanced confidence distribution. Hence, it is very necessary to embed the SE-ResNet module of Squeeze-and-Excitation Networks (SENet) to improve the identification accuracy and speed of the network. The SENet 35 , a CNN structure of attention mechanism proposed by Jie Hu et al. from Momenta, is used to emphasize the information features and inhibit the features of non-object information by learning to use global information. There are two core operations in the SENet, including squeeze and excitation. As shown in Figure 10 35 , Ftr and Fsq represented squeeze operations, Fex indicated excitation operation, X stood for input, and U meant the result of intermediate transformation. Additionally, H, W and C were the width, height and number of layers of the network. In the squeeze operations, the feature maps of H×W×C were changed into those of 1×1×C through global pooling, so as to obtain the global feature at the channel level, meaning that the H×W pixel was compressed into 1×1 pixel, which was also realized via global average pooling. The excitation operation aimed to obtain the relationship between channels after the global features were acquired. In this operation, the bottleneck structure of two fully connected (FC) layers was employed. The

6/23
first FC layer reduced the dimension with a dimensionality reduction coefficient r (a hyperparameter) of 16, which was then activated by ReLU. The second FC layer functioned to restore the original dimension, and the activation value of each channel could be ultimately obtained by means of Sigmoid activation. In the last scale operation, the weight output by excitation operation was regarded as the importance of each feature channel following feature selection, which was then weighted to the original feature channel by channel using multiplication. It means that different channels are provided with different weights, and the darker the color is, the greater the weight will be. The whole process could be regarded as the learning of weight coefficient of each channel, so that the model was more capable of identifying the features of every channel. Based on the idea of SENet, there are two improvement methods for the network in general: (1) directly adding SENet after the convolution layer (It is applicable to any network, but it may generate a large number of convolution layers and parameters, thus decreasing the training and learning speed, reducing the identification effect, and requiring many experiments to determine which convolution layers are added with SENet.), and (2) introducing the SE-ResNet module of SENet to replace the inception or residual layer in the original network. The SE-ResNet module is shown in Figure 11. In this method, the location of the embedded SE-ResNet module was relatively definite, and did not need to be confirmed by repeated experiments. There were multiple residual layers in the YOLOv3 network, so the network was improved by embedding the SE-ResNet module ( Figure  12).

YOLOv3-CSE network
In combination with the four aforementioned improvement methods, the original YOLOv3 network was modified. Specifically, the original feature extraction network DarkNet-53 was improved to DarkNet-43 network, the number of residual modules was decreased from 5 to 4, and the numbers of residual structures in each stage were set to 1, 2, 8 and 8, respectively. In addition, the SE-ResNet module was embedded into the improved YOLOv3 network, that is, the SE-ResNet module was embedded into the last residual layer of the Residual structure at the end of each stage. In the improved YOLOv3 network, the residual structures in the last stage of the original YOLOv3 network were removed, so the original 107 layers were reduced to 94 layers. Moreover, as 3 SE-ResNet layers were embedded, the improved YOLOv3 network was composed of 97 layers instead of 107 layers in the original one. The improved YOLOv3 network was named YOLOv3-CSE network in the present study, whose structure is shown in Figure 12.

Environment configurations
The hardware configurations for model training and testing are as follows: Intel (R) Core (TM) i7-10700K CPU @ 3.80 GHz, 24 G DDR, with NVIDIA GeForce RTX3080 Ti Graphics Card and algorithmic program environment of Windows10 Professional, CUDA 11.0. OpenCV 4.5.3 vision library was used for morphological image processing, and PyTorch 1.90 was employed as a framework in Python 3.7 environment to implement the training, testing and application of the whole algorithm.

Scheme design
Four improvement schemes based on the YOLOv3 network were compared, and a series of indicators of the sugarcane bud identification effect in different improvement schemes were comprehensively analyzed ( Table 1). The Mosaic data augmentation was introduced in Scheme 1, and the original network feature layers were improved in Scheme 2 on the basis of Solution 1. In Scheme 3, IoU was further modified into CIoU based on Scheme 2. Finally, the YOLOv3-CSE network was embedded with the SE-ResNet module on the basis of Scheme 3.

Scheme
Data augmentation Network feature layer Loss function SE-ResNet module 1 Table 1. Scheme design.

Evaluation criteria
In this study, precision, recall, F1-score, and mAP were selected as the evaluation criteria to better evaluate the model designed. Based on the IoU threshold, the sample was negative when the IoU of the prediction box and the ground truth box was less than 0.5, while it was positive when the IoU exceeded 0.5.
Where, TP is the number of positive samples of correctly identified sugarcane buds, and FP denotes the number of positive samples of erroneously identified sugarcane buds.
Where, FN means no negative sample of sugarcane buds.
Where, P(R) represents the P-R curve function, and n stands for the number of identification types. Only one identification type was involved in this study, so n=1 was adopted.
Model training Some parameters of the YOLOv3 network could be determined through repeated testing. In order to select the optimal parameter values, the mode was tested repeatedly, and it was found that the accuracy of the model was relatively high when the learning rate was equal to 0.001. As a result, the initial learning rate was set to 0.001, which was decreased gradually with the increased number of training iterations. The final learning rate, IoU threshold, batch size, confidence and number of iterations were set as 0.0001, 0.5, 12, 0.01 and 100, respectively. The settings of model parameters are listed in Table 2. During training, small-batch training was conducted with 12 pieces of images as a batch, and the weights were updated once after the training of each batch of images. After training, 100 weights were screened to generate 10 weight files with relatively small test loss for inspection, from which the one with the highest mAP was selected as the weight file. Finally, the test set and the validation set were tested, and the test results were saved. The change curves of loss value of five different network models for sugarcane bud training are shown in Figure 13. In the initial training stage, the sugarcane bud detection model manifested high learning efficiency and fast training convergence. As the number of iterations increased, however, the slope of the training curve was gradually decreased. When the number of training iterations reached 50, the changes in loss value tend to be stable after obvious fluctuations in convergence, and the loss value was converged slowly, uniformly and finally stably after 100 rounds of training. Additionally, the loss value rose slightly beyond 100 rounds, indicating overfitting of training set of the model. Hence, 100 rounds was determined as the termination condition of model training in comprehensive consideration of the accuracy of training the network model, so as to avoid the overfitting of the model due to excessive training times. According to the change curves of loss value of the five different network models, the YOLOv3-CSE network displayed prominently faster training convergence and milder fluctuations in convergence, as well as slightly lower loss value after training than the other four network models.

8/23
Results and analysis

Analysis of model indicators under different hyperparameters
The analysis results of the sugarcane bud identification network model with different IoU thresholds in the performance test are shown in Figure 14. Within a certain range, the IoU threshold could directly influence the precision of the model, and the recall basically remained unchanged. As the IoU threshold increased, the overlap rate between the detected prediction box and the ground truth box rose, and the number of false detections also increased. When the IoU threshold reached 0.5, the values of recall and F1 stood at 95.87% and 0.94, respectively, which means that the network model has achieved a sufficiently high identification precision, thus providing a basis for the identification and the determination of the orientation of sugarcane buds. The analysis results of the sugarcane bud identification network model with different confidence thresholds in the performance test are shown in Figure 15. As the confidence threshold increased, the value of mAP of the network model decreased. When the confidence threshold was lower than 0.6, the values of F1, recall and precision remained unchanged. On the contrary, when the confidence threshold exceeded 0.6, the values of F1 and recall began to decrease slowly, but the value of precision increased slowly. To obtain higher values of mAP and F1, the confidence threshold in this study was set at 0.01. When the values of mAP and F1 reached 95.87% and 0.94 respectively, YOLOv3-CSE network achieved the best prediction results.

Comparison of performance improvement between network models
According to  Table 3. Comparison of identification performance between models.

Comparison with other identification network models
To further test the effectiveness of the YOLOv3-CSE network in identifying the sugarcane bud features, the training was conducted in the same dataset, and the network model in this study was compared with other network models with the same indicators. According to Table 4, different network models showed obvious differences in performance-related indicators. Specifically, CenterNet had the lowest values of mAP and precision, despite the smallest weight and the high identification velocity of a single image. With VGG16 adopted as the backbone network, Faster RCNN had a relatively lower value of mAP, the longest identification time of a single image and the largest size of weight files. With ResNet50 adopted as the backbone network, RetinaNet had higher values of mAP and precision than those of CenterNet and Faster RCNN, but lower than those of YOLOv4 12 . YOLOv4 had better performance-related indicators but relatively larger weight files. Hence, it was still slightly inferior to YOLOv3-CSE in all performance-related indicators. The sugarcane bud identification effect of all network models is shown in Figure 16. Obviously, except for YOLOv3-CSE and YOLOv4, the other three network models showed a deviation between the target location and the real location of sugarcane buds. Moreover, CenterNet also missed identification. YOLOv3-CSE achieved better performance than YOLOv4 in terms of confidence in identifying sugarcane buds.

Conclusions
A YoloV3-CSE-based sugarcane bud identification method was proposed in this study. Before the training of the network model, data augmentation was conducted to enhance the diversity of data. The inadequate generalization ability in training small datasets was further strengthened. Then, the feature layer of the network and the bounding-box regression loss function were improved. Finally, the SE-ResNet module was embedded to reduce the parameters and computation, increase the identification velocity, and decrease the size of network models. Great improvements were made in identification velocity and precision, and the performance and identification effect of each network model were compared. The research results are as follows: 1) The improvements reduced the parameters and computation of the network model. For YOLOv3, the size of weight files was decreased by 50.89% from 240.7 MB to 118.2 MB. In addition, the improvement method was verified according to the evaluation criteria of network model performance. The results showed that the values of mAP and precision of the network model reached 95.87% and 96.93%, respectively, after the improvement, and the identification time was 0.15 s. Compared with the original YOLOv3 network, the values of mAP and precision rose by 4.95% and 5.66%, respectively.
2) When the IoU threshold was higher than 0.5, the values of mAP, precision, recall and F1 decreased with the rise of the IoU threshold. Given the IoU threshold, the values of mAP and F1 decreased as the confidence threshold increased. The results showed that when the IoU threshold was 0.5 and the confidence threshold was 0.01, the network model achieved the best prediction results.
3) The YOLOv3-CSE network employed solved the difficulty in dynamically identifying small targets of sugarcane buds, and boasted the advantages of strong robustness, good real-time performance, high accuracy and high detection speed, thus providing a technical guarantee for the application of the intelligent directional seeding machine for sugarcane seeds.