Small Target Disease Detection based on YOLOv5 Framework for Intelligent Bridges

This paper proposes a small target disease detection method using YOLOv5 framework for detecting small apparent diseases on intelligent bridges, aiming to address the problem of missed and false detection. To enhance the detection of small apparent diseases, a layer for detecting small objects is added to the YOLOv5 model. Additionally, an ECA attention mechanism module is embedded in the feature enhancement network to improve the extraction of disease features. To validate the eﬀectiveness of the proposed algorithm, a dataset of 996 bridges with apparent diseases such as corrosion, rebar, speckle, hole and spall was established and trained after manual annotation and data augmentation. The experiment showed that the proposed algorithm achieves a mAP of 87.91%. Compared to the original YOLOv5 model, the proposed algorithm improved the mAP on the bridge apparent disease dataset by 1.97%. This algorithm accurately detects small apparent diseases on bridges and eﬀectively reduces missed detection.


Introduction
Along with the development of the economy and transportation infrastructure, bridges have become increasingly important in ensuring transportation safety, especially in the context of highway transportation.They play a crucial role and are key to ensuring safe and efficient transportation.Our country is the world's leading builder of bridges, with nearly one million highway bridges currently in use and almost 30,000 new bridges being added every year [1].However, with the surge in traffic in recent years, many bridges have been subjected to overloading, excessive weight, and natural aging, leading to a large number of structures becoming suboptimal or even dangerous.Therefore, it is crucial to assess the health of bridges in a timely manner and to repair any damage or deterioration as quickly as possible.Some common types of damage that bridges may experience include corrosion, rebar, speckle, hole and spall [2].When the rebar corrodes, the corroded section can expand to more than ten times its original size, causing an extrusion on the surrounding concrete that can result in cracking, spalling, and a reduction in the effective size of the section.This ultimately leads to a decrease in the structural bearing capacity of the bridge.Corrosion can also damage the materials of the main bridge structure, further decreasing the bridge's bearing capacity.In addition, when the cross-sectional area is reduced due to spalling, the stress on the remaining section increases, making it more susceptible to erosion by air or other harmful substances.Therefore, detecting and addressing such damage in a timely manner is essential to reducing instances of bridge damage and ensuring the safety and longevity of our bridges.
At present, object detection is being well applied in the field of computer vision [3][4][5], and it has become possible to detect bridge diseases in different environments.It worth noting that object detection is different from the deep learning based automatic modulation classification [6,7], specific emitter identification [8][9][10] and malware traffic classification [11,12].Object detection algorithms are divided into two categories: two-stage detection [13][14][15][16] and single detection [17][18][19].Although the two-stage detection accuracy is higher, the detection speed is slower in real-time detection.However, the detection performance of these algorithms has been poor for small target detection.Currently, researchers are starting to focus on multi-scale feature learning [20] and small object detection [21,22].Jang et al. [23] propose an automatic bridge assessment method that uses a combination of visual and infrared thermal images to improve detectability and effectively solve false alarm situations caused by complex environments.Zhang et al. [24] proposed a single-stage detection method and used transfer learning.To improve accuracy, batch normalization and focal loss methods were added.Yu et al. [25] adjusted the original image instead of adjusting it before inputting the image into the network.Additionally, the anchor box is generated by K-means, and the improved method had better performance.
Tan et al. [26] proposed a hybrid scaling method to scale the model and, at the same time, gave different weights to the features output by the feature pyramid, and the weights were adaptively updated by the network.Liu et al. [27] proposed an IPG-Net feature extraction pyramid, which provides more spatial information for the network.It solves the loss of deep, small target feature information, but the model has a large number of parameters and slow calculation speed.Liu et al. [28] used the loss information as feedback information in the loss function to bring it into the calculation and supervise the training of small targets, but its robustness is not high.Zhao et al. [29] proposed the SODet method to connect objects with distant objects and extract relevant local information from the image.The detection accuracy of the algorithm for small objects on the MS COCO dataset is 31.5%.In order to improve the detection accuracy of small target objects, Brais et al. [30] optimized the model by improving the loss function and adding an attention mechanism, and the final accuracy reached 37.66%.Aduen et al. [31] used the Resnet50 and DenseNet networks to replace the original backbone network.The improved model detection speed is faster, and the detection accuracy of small objects was also improved by 6.9%.Firstly, traditional manual inspection methods are not only costly but also inefficient.Secondly, due to the variety of bridge types, there is a certain subjectivity and errors in the detection process.Additionally, the complexity of bridge structures and environmental factors, such as lighting conditions and occlusion issues caused by diseases, posed challenges for target detection algorithms in bridge disease detection, especially for detecting small target diseases.In this paper, a dataset is made for the small target bridge apparent disease detection task, which is labelled and used for training.Finally, the feasibility of the proposed algorithm is verified by the test set.

Introduction of YOLOv5 Framework
YOLOv5 is a one-stage object detection algorithm proposed by Glenn Jocher et al in 2020.Depending on the depth and width of the network, the models can be classified as YOLO v5s, YOLO v5m, YOLO v5l and YOLO v5x, from smallest to largest.In this paper, YOLO v5s is used.YOLO v5 is composed of three parts: a feature extraction backbone network, a feature pyramid [32], and a detection head.The feature extraction backbone network, CSPDarknet, extracts the features of the input images, which are then called feature layers, the feature sets of the input images.Finally, the backbone network outputs three effective feature layers.In YOLO v5, the feature extraction pyramid, Panet, is used to enhance the feature extraction of the output effective feature layer.
The three effective feature layers are fused to combine the feature information from different scales.After feature fusion, three effective feature layers are obtained.In this case, each feature map can be considered a set of feature points, and each feature point is characterized by the number of channels.Then, the feature points are examined to determine if they correspond to objects.Although YOLOv5 is fast and practical in object detection, it still has shortcomings when detecting obvious defects in small targets: • Although YOLO v5 has three detection layers and contains sufficient features, its prediction accuracy is low when it comes to detecting small bridge diseases; • YOLO v5 has missed and false detection when detecting diseases.In this regard, this paper proposes the YOLO v5 network, which is more suitable for detecting small target diseases in bridges.
3 Our proposed detection method using improved YOLOv5 algorithm

The Addition of Small Target Disease Layer
In disease detection, the distance at which disease images are captured is equivalent to the proportion of the disease's size in the image.Due to the low number of pixels occupied by small targets in the image, there is a higher likelihood of missed detections and lower detection accuracy.Therefore, we have defined small targets in relation to disease detection.A disease is considered a small target if the ratio of its width and height to the original image size is less than or equal to 0.1.
The original Yolo v5 model output has three feature layers: 80×80, 40×40 and 20×20, from largest to smallest.The original input image size is 640×640.The receptive field of the 80×80 feature layer is 8×8 (640×640/80×80).Compared to the other two layers, the 80×80 feature layer is more beneficial in detecting obvious defects in small objects.However, during the detection process, it was found that if the disease target is smaller than 8×8 pixels, the shallow feature information cannot be fully utilized, leading to insufficient accuracy in small target recognition.
To improve the network's ability to detect small target diseases, this paper proposes a detection layer specifically for small targets.The output feature layer has a size of 160×160, and a receptive field of 4×4, enabling disease detection on small targets with a pixel size of 4×4.Firstly, a feature map of size 160×160 is output from the backbone feature extraction network, which is then sent to the feature fusion network (Neck) for feature fusion in order to obtain more complete features to improve the detection of small disease targets.The complete improved network structure is shown in Fig. 1.The Prediction Head for Small Objects (PHSO) module is for small object detection, while the Efficient Channel Attention (ECA) module is used for better feature representation.

Integrating ECA Attention Mechanism
ECA is a lightweight plug-and-play channel attention module, and its network structure is shown in Fig. 2. Without changing the original channel dimension and the interaction between different channels, it can set different weights for different channels based on the information size of each channel related to the detection of bridge diseases.This enables the module to highlight useful feature information and suppress irrelevant feature information, ultimately improving the detection of diseases.Here, H and W represent the height and width of the feature map, respectively, while C represents the number of channels in the feature map.The variable k represents the size of the convolution kernel, which is dependent on the value of C. The size of the convolution kernel is adaptively changed by a function where |X| odd means that only odd numbers can be taken, and the coefficients of b and γ are 1 and 2, respectively.In this paper, the values of C are 128, 256 and 512, and the corresponding k is 5, 5 and 5. 4 Bridge Disease Dataset and Hyperparameter Setting

Bridge disease image dataset
We have selected 996 images depicting five common apparent diseases of bridges: corrosion, rebar, speckle, hole and spall.The size of all images is 640x640.In order to ensure the model's versatility, the photographs were taken without specifying a fixed distance and angle.Each image has a different distance and angle, and the shooting environment varies greatly.For example, the shooting environment may have insufficient lighting, presence of weeds, or graffiti affecting the image.LabelImg opensource software was used to annotate these filtered images.A selection of the annotated real images is shown in Fig. 3. Due to the limited number of labels in each category of the original dataset and the significant imbalance in label quantities among different diseases, the dataset was augmented to address this issue.Specifically, four augmentation techniques, namely horizontal flip (H-Flip), vertical flip (V-Flip), translation (Transl), and crop and pad (Cr-Pad), were applied to enhance the rebar category.For the hole and speckle diseases, both were augmented using the aforementioned four techniques.The corrosion and spall diseases were also present in the same image and were augmented using H-Flip and V-Flip.As a result, a total of 2022 augmented images were obtained.
Fig. 4 shows the distribution of the position and size of the target box in the apparent disease dataset.Figure 4a shows the distribution of the target box center in the data picture after the resolution size of the image is standardized.The darker the pixel color of each point is, the more concentrated the target box is. Figure 4b shows the ratio of the width and height of the target box in the entire image.From the figure, it can be seen that the proportion of small targets in the dataset is relatively high.

Experimental environment
The system used in this paper is Ubuntu 20.04, with an Intel(R) Xeon(R) Gold 6338 processor, 128 GB of memory, and an NVIDIA A40 graphics card.The programming language used is Python, and the network model is based on the PyTorch deep learning framework, specifically PyTorch 1.8.1, CUDA 11.4, and Python 3.8.13.

Experimental parameter setting
The optimizer used in the training process for this paper is SGD.The initial learning rate is set to 0.01, and the learning rate adopts the cosine annealing strategy.The momentum factor is set to 0.937, and the batch size is set to 4. The training rounds are 800.The size of the input image of the model is 640 × 640, and the training, validation, and test sets are divided by 8:1:1.During training, the dataset is augmented using methods such as vertical flipping, horizontal flipping, and random clipping to improve the model's generalization ability.Finally, 3018 images are obtained through data augmentation.The model was trained using a dataset consisting of 2414 training examples.It underwent 800 epochs with a batch size of 4.During each iteration of a small batch, the model predicts the bounding box score and compares it with the ground truth label, using the loss function to compute the loss value.Subsequently, the model weights are updated using stochastic gradient descent to minimize the loss value.The model's performance was evaluated during the training process using a set of 302 validation examples.Model performance is quantified by computing the mean Average Precision (mAP) index.After completing the training, the performances of the original YOLOv5 model and the improved YOLOv5 bridge multi-disease detection model were compared.At the same time, the trained and improved model is used to test the apparent disease images of the test set to prove the algorithm's effectiveness.Additionally, the improved model proposed in this paper will be compared with the current mainstream object detection model algorithms to further analyze the proposed algorithm's performance in bridging apparent disease detection.

Model performance evaluation metrics
The evaluation indicators used in this paper mainly include detection precision, recall, disease precision AP (average precision), and mean average precision (mAP) of various diseases.Detection precision P and recall R are calculated using Equations ( 2) and ( 3), where TP represents the number of correctly identified current disease targets, FP represents the number of other types of diseases identified as current types of diseases, and FN represents the number of current diseases not identified.If the IoU is greater than or equal to the given threshold, the detection box is marked as TP; otherwise, it is marked as FP.If the real disease target detected does not correspond to the prediction box, it is marked as FN.AP and mAP are calculated using Equations ( 4) and ( 5), where n denotes the total number of disease categories.In this paper, the mAP index is mainly used to evaluate the model.

Comparison of experimental results between the model in this paper and Yolov5 algorithm
The performance comparison between YOLOv5 and the improved YOLOv5 model is shown in Table 1.It can be seen from the table that due to the generally small size of hole disease, the improvement is significant, and the overall model has increased by 1.97 percentage points compared to the pre-improvement model.Next, disease images will be used to demonstrate the detection effects of five diseases: corrosion, rebar, speckle, hole, and spall.The detection effects of YOLOv5 and the improved YOLOv5 model are shown in Fig. 5 and Fig. 6, respectively.Fig. 4 and Fig. 5 show that the improved YOLOv5 algorithm achieves higher apparent disease recognition accuracy.It can not only identify cavity diseases and dew tendon diseases with small target sizes, but also has higher recognition accuracy for concrete damage, water erosion, and pock surface diseases.Furthermore, it still maintains considerable recognition accuracy under complex background conditions.The improved model can effectively reduce false detection and missed detection of diseases caused by complex backgrounds, insufficient illumination, dense distributions of diseases, and small sizes of diseases.In addition, to more intuitively compare the performance of the YOLOv5 algorithm and the improved YOLOv5 algorithm, this paper introduces the Precision-Recall (PR) curve to evaluate the model's performance.In the PR curve, the abscissa represents the recall rate, and the ordinate represents the precision rate.

Analysis of ablation experiments
Additionally, to analyze the effect of different improvement methods on the model's detection performance, three groups of experiments were designed, each using the same dataset and training parameters to complete the training.The first group represents the YOLOv5 algorithm, the second group adds the small object detection layer, and the last group uses the ECA attention mechanism module on top of the second group, which is the improved YOLOv5 algorithm proposed in this paper.The specific experimental results are shown in Table 2.
According to the experimental results in Table 2, the original YOLOv5 achieved an mAP value of 85.94 in the first group.In the second group of experiments, a small object detection layer was added along with the required feature extraction module.This resulted in a larger feature mAP being generated to detect the target with a receptive field less than 8x8 of the disease target, and the mAP was increased by 0.62%.In the third group of models, the model proposed in this paper added the ECA attention mechanism on top of the second group of experiments, and its mAP finally reached 87.91%.In summary, the improvement strategy proposed in this paper has an overall positive effect.Compared to the original YOLOv5 model, the mAP for detecting hole, rebar, spall, and speckle apparent diseases has increased by 6.43%, 1.12%, 0.63%, and 1.92%, respectively, resulting in an overall mAP increase of 1.97%.This improvement allows for more accurate detection of apparent defects in small target bridges.

Comparison between the proposed model and other object detection models
In order to further analyze the performance of the improved model on Bridges' multiple apparent diseases, the improved YOLOv5 model was compared with the CenterNet [34], Faster R-CNN [35] and SSD detection algorithms [36].All of the aforementioned models use the same dataset and parameters, and mAP was used as the main evaluation index.The experimental results of the five models are shown in Table 3.Compared to CenterNet, Faster R-CNN, and SSD models, the average accuracy of the improved model in this paper increased by 2.72%, 19.96%, and 19.07%, respectively.The experiments demonstrate that the detection accuracy of the improved YOLOv5 bridge apparent disease model is optimal.

Analysis of robustness experiments
To verify the robustness of the proposed model, we used the VisDrone2019 object detection dataset released by Tianjin University in 2019 [37].The dataset consists of UAV photographs of 10 types of targets, most of which are small objects.The

Conclusions
We propose a bridge apparent disease detection model based on the improved YOLOv5, which includes a small target disease detection layer to address the issue of missed detection of small diseases.This effectively solves the missed detection situation caused by factors such as the small size of the disease target.Additionally, we introduce the ECA attention mechanism to suppress complex background feature information of the bridge surface and highlight the feature information of effective diseases.Compared to the original model, the improved YOLOv5 model has an increased mAP of 1.97%, allowing for more accurate identification of bridge apparent diseases.Despite the enhanced performance in detection demonstrated by the improved model, its augmented size has led to diminished real-time responsiveness.Subsequent research endeavors will prioritize model optimization, particularly in terms of compressing it while maintaining accuracy, to enhance deployment.

Fig. 1
Fig. 1 Overall structure of the improved YOLOv5 network.

Fig. 2
Fig. 2 Module structure of ECA attention machine.

Fig. 3
Fig. 3 Example of real annotation of apparent multi-disease of bridge.
(a) Distribution of the center points of the target box.(b) Distribution of target box sizes.

Fig. 5
Fig. 5 Effect of Yolov5 algorithm to identify multiple diseases.

Fig. 6
Fig. 6 Effect of improved Yolov5 in identifying multiple diseases.

Table 1
Comparison table of Yolov5 and improved Yolov5 model performance.

Table 2
Comparison table of Yolov5 and improved Yolov5 model performance.

Table 3
Comparison table of Yolov5 and improved Yolov5 model performance.shown in Table4, and it can be observed that the proposed model outperforms other mainstream object detection models significantly.

Table 4
Comparison table of Yolov5 and improved Yolov5 model performance.