Agricultural Pest Small Target Detection Algorithm Based on Improved YOLOv5 Architecture

doi:10.21203/rs.3.rs-3109779/v1

Download PDF

Article

Agricultural Pest Small Target Detection Algorithm Based on Improved YOLOv5 Architecture

https://doi.org/10.21203/rs.3.rs-3109779/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

How to accurately and timely detect harmful organisms without artificial intervention is an important research issue in agricultural technology. When using machine vision methods to detect pests and diseases, the targets of agricultural pests are often too small or obstructed, resulting in targets occupying too small pixels to be accurately recognized. To address this issue, this article proposes an effective pest detection algorithm for small targets——YOLO-Pest. This method is based on the backbone of YOLOv5s as the backbone network, mainly replacing the C3 module of YOLOv5 backbone layer and the PANet structure of the neck layer with C3CBAM module and BiFPN structure respectively, and adding CAC3 module to the neck layer to improve the detection accuracy of small targets. In addition, based on the IP102 dataset, this article selected 40 species of pests as the dataset for model evaluation. The experimental results show that the improved YOLOv5s achieves MAP0.5 and MAP0.5:0.95 of 64.6% and 40.1% respectively in the dataset selected in this article. Compared with the original algorithm, MAP0.5 and MAP0.5:0.95 have been improved by 1.6% and 0.1%, respectively.

Physical sciences/Engineering

Physical sciences/Mathematics and computing

In the process of agricultural production, pests have always been a major threat to crop yield and quality. The worldwide losses caused by pests and diseases in major crops worldwide reach 20%^[1]. Accurate and timely detection of harmful organisms is crucial for effective control of agricultural production.

Traditional pest identification relies on manual detection, which has low efficiency and is prone to errors. Traditional pest image detection and recognition methods have low accuracy and weak model generalization ability^[2]. With the development of computer vision technology, agricultural production is gradually moving towards intelligence. Most of the pest detection algorithms currently used are based on deep learning, and these algorithms are mainly divided into two categories: the first type is a two-stage target detection technology based on regional targets, represented by R-CNN, Fast R-CNN, and Faster R-CNN, using two-stage detection algorithms, which owns higher accuracy yet slower speed; the other is a two-stage detection algorithm represented by SSD and a single stage object detection algorithm based on regression represented by YOLO series^[3].

In recent years, Rahman^[4]et al. proposed a two-stage small-scale CNN architecture to detect pests and diseases in rice plant images. Ebrahimi^[5]et al. proposed a new image processing application based on Support Vector Machine (SVM) for detecting insects in situ from crop surface images, as well as pests such as white flies, house flies, and ants commonly found in strawberry greenhouses. Ni^[6] et al. proposed a new classification model for rice pests and diseases, based on the RepVGGa0 model, which integrates ECA attention mechanisms within the block and behind the head. It can achieve an accuracy of 97.06%. Ahmed^[7] improved the fully connected layer of the AlexNet network by using the improved network to analyze the preprocessed crop image set, thereby achieving the recognition of crop diseases and pests. LYU^[8] proposed a feature fusion SSD algorithm based on a top-down strategy, which improves the localization performance of small targets of grain pests. Xue^[9] integrated self attention and convolution (ACmix) and convolutional block attention module (CBAM) into YOLOv5, enabling the proposed model to better focus on tea leaf diseases and pests. Fan^[10] proposed an algorithm for detecting agricultural pests in natural environments based on attenuation factors. The backbone of YOLOv5s is used as the backbone network, and the attention mechanism SE is added to the bottleneck structure of the network to solve the problem of gradient disappearance in the training process and improve the detection accuracy of small targets.

In real agricultural scenarios, complex environments result in fewer features available for small pest targets, and the phenomenon of small target aggregation is prone to occur. This increases the difficulty of accurate detection of agricultural pests, while the YOLOv5 framework is prone to losing information and details after multiple upsampling of small target features. To address these issues, we propose an improved YOLOv5 architecture based agricultural pest small target detection algorithm, named YOLO-Pest. The main work of this article are summarized below:

(1) The original C3 module is replaced by C3CBAM module in the backbone network, which makes the backbone feature extraction network pay more attention to the characteristics of pests. In the neck section, BiFPN is used instead of PANet to enhance feature propagation and reusability. At the same time, the C3CA module is added to improve feature expression ability.

(2) A disease and pest dataset for model testing based on the IP102 dataset is established.

(3) Finally, high-speed computers are used to evaluate the performance of the model. In this study, the average accuracy of the improved model is increased by 1.6% compared to the traditional YOLOv5s model, and this method provides an effective way to maintain the economic interests of farmers.

Yolov5-Pest. The network structure of YOLOv5 algorithm is mainly divided into three modules: backbone, neck, and head. The backbone is used for feature extraction, the neck is used for feature fusion, and the head is used for target detection^[11]. The backbone module uses Cross Stage Partial Network (CSPNet) and Spatial Pyramid Pool Fast (SPPF) to extract input image features and transmit them to the neck module. The neck module utilizes Path Aggregation Network (PANet) to generate feature pyramids, which bi-directional fuse low-level spatial features with high-level semantic features, enhancing the detection ability of objects of different scales. The prediction section of the head module has three different scale feature maps. Based on the features of different scales, the corresponding prediction frame is generated for the target image to determine the category, coordinates, and confidence of the detected object.

According to the width and depth of the network, YOLOv5 also includes four versions: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. Among them, the YOLOv5s model can ensure detection speed while ensuring detection accuracy, and the model volume is minimized. Therefore, this article chooses YOLOv5s as the basic framework.

This article proposes a small target detection algorithm for agricultural pests based on an improved YOLOv5 architecture. YOLOv5 has been improved mainly from the aspects of Backbone network and feature fusion. The purpose is to enhance spatial and semantic information, improve detection accuracy, while maintaining running speed.

Firstly, replace the C3 module of the YOLOv5 backbone layer and the PANet structure of the neck layer are replaced by the C3CBAM module and BiFPN structure, respectively. The C3CBAM module here is mainly used to extract pest image features and enhance the weight of pest target areas in the feature map from both channel and spatial aspects; The BiFPN structure adds a path from high resolution to low resolution, improving the efficiency of the feature fusion process. Also, a C3CA module has been added to the neck layer, where the C3CA module is mainly used to enhance the ability of image feature extraction and residual feature learning, further improving detection performance. The overall structure is shown in Fig. 1.

The Loss function of the improved model consists of Classes loss, Objectness loss and Location loss. During the training process, the full objective function can be described as follows:

$$\begin{array}{c}Loss={}_{1}{L}_{cls}+{}_{2}{L}_{obj}+{}_{3}{L}_{loc}\#\left(1\right)\end{array}$$

where ${L}_{cls}$is Classes loss, ${L}_{obj}$ is Objectness loss, and ${L}_{loc}$is Location loss. Both the classification function and the confidence function use the BCE loss. The difference between them is that the Classes loss only calculates the classes loss of positive samples, and the Objectness loss calculates the objectness loss of all samples. Here, objectness refers to the CIoU of the target boundary box predicted by the network and GT box. The binary Cross entropy function is defined as the formula:

$$\begin{array}{c}L=-y\text{lg}p-\left(1-y\right)\text{lg}\left(1-p\right) \#\left(2\right)\end{array}$$

When the input sample is positive, y is 1, and when the input sample is negative, y is 0. P is the probability that the model predicts that the input sample is a positive sample. The Location loss is the CIoU loss, and only the positive positioning loss is calculated. In this study, we replace the CIoU loss with DIoU, and the calculation formula is shown in the Eq. (3).

$$\begin{array}{c}{L}_{DIoU}=1-\left(IoU-\frac{{\rho }^{2}\left(b,{b}^{gt}\right)}{{c}^{2}}\right)\#\left(3\right)\end{array}$$

where IoU is the intersection ratio, $b$ represents the center point of the predicted box and ${b}^{gt}$ represents the true center point of the box; ${\rho }^{2}(\bullet )$ is the Euclidean distance; $c$ is the diagonal distance between the minimum closed area of the predicted box and the actual box.

C3CBAM. In object detection tasks, the importance of the features of the target object varies in each channel, and the importance of pixels at different positions in each channel also varies. Small targets such as pests or occluded targets occupy fewer pixels in the feature map, which can easily lead to the loss of their feature information in deep networks. Therefore, only by simultaneously considering these two different levels of importance can the model more accurately identify the target object. The attention mechanism in neural network models can focus on information of interest and ignore useless information, enhancing important features and suppressing general features. Among them, the CBAM^[12] attention module combines spatial and channel attention modules, which can not only effectively increase the weight of occlusion or small targets in the entire feature map, but also easily embed them into any existing framework. It is a lightweight, simple and effective Convolutional neural network attention module. CBAM integrates attention weights from channel and spatial dimensions on the basis of the input feature map, and multiplies them with the input feature map to obtain a new feature map, which is conducive to extracting information from the feature map. The structure of the CBAM module is shown in Fig. 2.

The backbone feature extraction network of the original YOLOv5 adopts a C3 structure, and its structural diagram is shown in Fig. 3 (a). In order to enhance the ability of image feature extraction and residual feature learning, the CBAM attention module is used instead of the bottleneck module in the C3 module to improve the detection performance of small targets and occlusion. The overall structure of the C3CBAM module is shown in Fig. 3 (b), where the C3CBAM module includes multiple CBAM attention modules (the actual number N is determined by the product of n and depth_multiple parameters of the configuration file.yaml) and three standard convolution layers.

C3CA. The CA mechanism performs average pooling in both horizontal and vertical directions, and weights and fuses spatial information through spatial information encoding. The specific process of the coordinate attention block is shown in Fig. 4 (a). The detailed principle of coordinate attention mechanism is described in reference [13], which helps neural networks focus on effective coordinates and suppress invalid coordinates, thereby improving the efficiency of information flow in neural networks. The overall structure of the C3CA module is shown in Fig. 4 (b), which includes multiple CA attention modules (the actual number N is determined by the product of n and depth_multiple parameters of the configuration file.yaml) and three standard convolution layers.

BiFPN. Feature extraction networks are used to understand the context and meaning of images. In order to enhance the fusion of feature information at different scales, this study changed the PANet structure of the original YOLOv5 model to the BiFPN structure, as shown in Fig. 5. BiFPN has two core points ^[14]. Firstly, compared to the original FPN structure, it adds a path from high resolution to low resolution, improving the efficiency of the feature fusion process. Secondly, it removes nodes that only receive input from a single node, making BiFPN more efficient and lightweight than PANet. The Backbone network information of different scales is fused by up sampling and down sampling to unify the feature resolution scale. Adding horizontal connections between the original input and output nodes of the same feature reduces feature information loss due to excessive network layers. Therefore, as the model deepens, feature fusion can become more and more comprehensive.

Datasets. Wu^[15] et al. proposed a new large-scale pest detection and identification benchmark dataset, IP102. This dataset contains over 75000 images from 102 categories, presenting a natural long tail distribution. It has 19000 images with borders for object detection. It has different types of pests, including eggs, larvae, pupae, and adults, as well as labels for categories such as rice leaf caterpillars, rice stem maggots, and cicadas. Its main characteristics are: 1). graded classification system; 2). natural long tail distribution; 3). unbalanced data distribution; 4). rich types of pests; 5). small inter class differences while large intra class differences.

In this study, we selected forty specific categories from the IP102 dataset: brown planthopper, Asian rice borer, corn borer, rice leaf roller, rice leafhopper, wheat thrips, and beet armyworm to form a new dataset. This dataset contains 7526 images, and we chose these categories because these pests mainly occur in rice, wheat, corn, and sugar beet, which can represent typical pests of common crops. Some images in the dataset are shown in Fig. 6.

Evaluation Indicators. The mean average precision (mAP) is used as the evaluation indicator for the model, as shown in the following formulas:

$$\begin{array}{c}AP={\int }_{0}^{1}P\left(r\right)dr\#\left(4\right)\end{array}$$

$$\begin{array}{c}mAP=\frac{1}{n}\sum _{i=1}^{n}AP\left(i\right)\#\left(5\right)\end{array}$$

Among them, mAP is the average precision of all classified AP. N represents the number of classification types (n = 1), and$AP \left(i\right)$ is the AP of the i-th target class.

Experimental Platform. The experimental setup of this article is shown in Table 1. The main training parameter settings for the tea pest detection model are shown in Table 2.

Table 1

Experimental Conditions
Experimental Environment	Detail
Programming Language	Python 3.8
Operating System	Windows 10
Deep Learning Framework	Pytorch 2.0.0
GPU	GeForce RTX 2080 Ti

Table 2

Main Training Parameters
Training Parameters	Detail
Epochs	300
Batchsize	64
img-size(pixel)	640×640
Initial Learning Rate	0.01
Optimization Algorithm	SGD

Experimental Results. This article uses training and validation sets to train the model, the test set to test the trained model and obtain the data we need to evaluate the model, and evaluates the model through a series of comparative experiments. The purpose of the comparative experiment is to better demonstrate the advantages of the improved model. This study compares the improved YOLO-Pest model with the YOLOv5 model and commonly used attention mechanism modeis based on the YOLOv5 model (YOLOv5-C3CA, YOLOv5-C3CBAM, YOLOv5-BiFPN, YOLOv5-C3CA-C3CBAM, YOLOv5-C3CA-BiFPN). This model is trained and validated using the same dataset of pest and disease images. The comparison results of each model in terms of mAP are shown in Table 3.

Table 3

Performance Comparison of Different Algorithms (%)
Methods	mAP0.5	mAP0.5:0.95
YOLOv5	0.63	0.4
YOLOv5-C3CA	0.636	0.394
YOLOv5- C3CBAM	0.62	0.396
YOLOv5-BiFPN	0.632	0.393
YOLOv5-C3CA-C3CBAM	0.591	0.365
YOLOv5-C3CA-BiFPN	0.632	0.397
YOLOv5-C3CBAM-BiFPN	0.631	0.395
YOLOv5-C3CA-C3CBAM-BiFPN(YOLO-Pest)	0.646	0.401

As shown in Table 3, the mAP0.5 and mAP0.5:0.95 of YOLO-Pest reached 64.6% and 40.1% respectively, which are 1.6% and 0.01% higher than the mAP0.5 and mAP0.5:0.95 of YOLOv5, and also higher than other models. Therefore, integrating C3CBAM, C3CA module, and BiFPN structure into YOLOv5 network can improve the performance of agricultural pest detection models.

In addition, Fig. 7 provides some examples of test results. From Fig. 7, it can be seen that both YOLOv5 and YOLO-Pest can correctly detect pests and effectively identify them. However, the high confidence level detected using YOLO-Pest further proves that the model proposed in this paper is conducive to improving the performance of the pest detection model.

The agricultural pest detection method based on the YOLO-Pest model proposed in this study successfully integrates C3CBAM, C3CA, and BiFPN into YOLOv5, improving overall detection performance. After testing on the pest dataset constructed in this study, YOLO-Pest's mAP0.5 reaches 64.6%, indicating that the overall performance is superior to other models. The overall model is compact and easy to deploy to embedded devices, meeting the requirements of real-time monitoring. Meanwhile, the improved YOLOv5 model has better stability and generalization.

Data Availability

Data can be provided on request to the corresponding author.

Acknowledgements

This research work was supported by the Basic and Applied Basic Research Fund of Guangdong Provincial (General Program),Grant Numbers 2022A1515240074 and 2021A1515012245.

Author Contributions

H.Y.S. has contributed to data analysis and provided guidance for work. Y.Y.Y. has drafted the work and made substantial contributions to the conception and design of the work. M.X. has critically revised the organization, structure, and knowledge content of the work. D.J.D. has contributed to the collection of data. Z.K.X. and Y.L.L. have contributed to data processing. All authors discussed the issues and exchanged views on the manuscript.

Competing Interests

The authors declare no competing interests.

Additional Information

Correspondence and requests for materials should be addressed to Y.Y.Y.

R. Mateos Fernández, et al. Insect pest management in the age of synthetic biology. Plant Biotechnol. J. 20, 25-36 (2022).
Q. Xiang, et al. Yolo-Pest: An Insect Pest Object Detection Algorithm via CAC3 Module. Sensors. 23, 3221 (2023).
Y. Wang, R. Xu, D. Bai & H. Lin. Integrated Learning-Based Pest and Disease Detection Method for Tea Leaves. Forests. 14, 1012 (2023).
C. R. Rahman, et al. Identification and recognition of rice diseases and pests using convolutional neural networks. Biosys. Eng. 194, 112-120 (2020).
M. Ebrahimi, M. H. Khoshtaghaza, S. Minaei & B. Jamshidi. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 137, 52-58 (2017).
H. Ni, et al. Classification of Typical Pests and Diseases of Rice Based on the ECA Attention Mechanism. Agriculture. 13, 1066 (2023).
F. Ahmed & B. Wang. Identification of Crop Diseases and Insect Pests Based on Deep Learning. Scientific Programming, Hindawi. 9179998.
Z. Lyu, H. Jin, T. Zhen, F. Sun & H. Xu. Small object recognition algorithm of grain pests based on ssd feature fusion. IEEE Access. 9, 43202-43213 (2021).
Z. Xue, R. Xu, D. Bai & H. Lin. YOLO-tea: A tea disease detection model improved by YOLOv5. Forests. 14, 415 (2023).
X. Fan, Y. Wang, J. Liu, Z. Yin & Z. Yang. Agricultural pest detection algorithm based on attention mechanism. In Proceedings of the 2022 IEEE 22nd International Conference on Communication Technology (ICCT) 1709-1713 (2022).
Z. Ye, et al. Recognition of terminal buds of densely-planted Chinese fir seedlings using improved YOLOv5 by integrating attention mechanism. Frontiers in Plant Science. 13, (2022).
L. Li, X. Mu, S. Li & H. Peng. A review of face recognition technology. IEEE Access. 8, 139110-139120 (2020).
H. Tang, S. Liang, D. Yao & Y. Qiao. A visual defect detection for optics lens based on the YOLOv5-C3CA-SPPF network model. Optics Express. 31, 2628-2643 (2023).
C. Yu & Y. Shin. SAR ship detection based on improved YOLOv5 and BiFPN. ICT Express. (2023).
X. Wu, C. Zhan, Y.-K. Lai, M.-M. Cheng & J. Yang. Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 8787-8796 (2019).

No competing interests reported.

Supplementarymaterial.pdf

Download PDF

Version 1

posted

You are reading this latest preprint version

Agricultural Pest Small Target Detection Algorithm Based on Improved YOLOv5 Architecture

Status:

Version 1

Abstract

Figures

Introduction

Methods

Results

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1