Real-time power line segmentation detection based on multi-attention with strong semantic feature extractor

Power line is an important part of the transmission line, is the only carrier of power transmission, so the detection of power lines is to ensure the stable operation of the power system is an important means. Therefore, to improve the efficiency of power line detection, this paper proposes a real-time power line segmentation method based on multi-attention mechanism and strong semantic feature extraction. The method is improved based on the DeepLab V3+ codec model. In the encoder part, the Convolutional Block Attention Module (CBAM) is firstly introduced in MobileNetV2 network, which strengthens the ability of contextual information interaction; the ASPPAttention fast feature fusion structure is proposed, which achieves the fast extraction of multi-dimensional effective information by designing the depth-separated convolution of different perceptual fields and strengthens the pixel-level feature encoding ability through the coordinate attention (CA) mechanism; in the decoder part, this paper proposes a real-time power line segmentation method based on multiple attention mechanisms and strong semantic feature extraction. In the decoder part, this paper proposes a real-time power line segmentation method based on multi-attention mechanism and strong semantic feature extraction. In the decoder part, a lightweight inverted convolutional decoder structure is proposed, which improves the feature extraction capability of the model by introducing an inverted bottleneck convolution structure in two quadruple downsampling layers with fewer parameters, and avoids heterogeneous splitting through the introduction of the CA attention mechanism; during the training process, the model convergence is accelerated through the migration of the VOC’s training weights, and the model convergence is avoided through the introduction of the Dice Loss, the effect of the number of samples on the model to accelerate the model convergence speed. The loss avoids the effect of sample number on model generalisation. The experimental results show that the mean intersection over union (mIoU) of this paper can reach 48.5%, the accuracy can reach 97.5%, and the detection speed of the model can reach 40.8 frames per second (fps), which is better than HRNet, PSPNet, DeepLab V3+ and other network models in the balance of speed and accuracy.


Introduction
Power lines are an important part of the transmission network and are the only way to transport electricity.Therefore, power line inspection is the only way to ensure the smooth operation of transmission lines.However, the current method of power line inspection mainly relies on manual inspection, which is not only inefficient but also extremely costly.In addition, the risk factor of the manual inspection method is extremely high; according to the statistics of the China National Energy Administration, a total of 35 power safety accidents occurred in 2020, resulting in 44 deaths.In addition, with the further development of China's ultra-high voltage and other projects, it poses a more serious challenge to electric power inspection.Therefore, it has become imperative to explore efficient, convenient and safe transmission line inspection methods.
In the early days, in addition to the manual visual inspection method for power line inspection, the main methods of electric and optical signal measurement [1][2][3] and remote sensing drone capture detection [4,5] were used for inspection.Among them, the electric and optical characteristic measurement detection method refers to the detection of whether the transmission line produces abnormal light waves and electromagnetic signals through specific equipment to determine whether the transmission line is faulty.Such methods have achieved a certain academic value, but in the actual detection process, the effect is not very satisfactory.The main reasons are as follows.(1) In the actual detection site, photoelectric detection equipment is easy to be disturbed by natural light.(2) The equipment is easy to be affected by the electromagnetic interference of power equipment, which affects the actual detection results.(3) This kind of method does not have preventive detection, so it is not possible to avoid economic losses.
In contrast, the use of drones and remote sensing equipment [6,7] not only avoids the interference of detection equipment with the transmission line to be tested, but also does not rely on manpower and has a relatively higher safety factor.According to which deep learning algorithms are used, thses can be divided into traditional image processing detection algorithms and deep learning detection algorithms.Traditional image processing algorithms refer to the detection of the collected data by such means as edge detection, image morphology and wavelet transform.This type of algorithm has higher requirements for image quality, and the generalisation ability of the model is weaker, so it is difficult to carry out the task of electric power inspection in complex scenarios.Due to the fact that the deep learning algorithms have a strong generalisation ability, they make up for the deficiencies of the traditional image processing methods, and so the deep learning algorithms can be better applied to transmission line inspection tasks under complex scenes.However, most of the existing transmission line inspection algorithms have high complexity [8][9][10] and cannot better achieve the balance between network inspection accuracy and speed.Based on this, this paper proposes a real-time transmission line segmentation method based on multi-attention mechanism and incorporates a strong semantic feature extractor.The main contributions of this paper can be summarised in the following three points: (1) For the tower and power line captured from multiple angles, this paper combines the existing technology with the DeepLab V3+ framework [11] and introduces the CBAM attention mechanism [12] in the encoder part of the downscaled feature extractor to strengthen the ability of interacting with the contextual information.In the deepest layer, the CA attention mechanism [13] is added to improve the model's attention to the fine-grained features at the pixel level, and at the same time, the structure of the decoder of the strong semantic feature extraction is proposed in the original DeepLab V3+ decoder, which has the following advantages.At the same time, a strong semantic feature extraction decoder structure is proposed, and the original DeepLab V3+ decoder structure is designed with successive stacked deconvolution blocks [14] with fewer parameters, which improves the detection performance of the model while introducing a small amount of computation.
(2) The TTPLA open source dataset [15] is used as the experimental object, and an example of the dataset is shown in Fig. 1.Aiming at the uneven distribution of data samples and the network convergence problem, transfer learning [16] and dice loss [17] are proposed as the training strategy to accelerate the convergence of the model by migrating the pre-training weights of the VOC [18] dataset.Comparison experiments and ablation experiments demonstrate that the proposed model can perform power inspection tasks better compared to others.
The rest of the sections of the article are structured as follows.Section 2 introduces the work related to transmission line inspection.Section 3 introduces the real-time power line segmentation method based on the multi-attention mechanism and the strong information decoder.Section 4 carries out the comparative test and the analysis and discussion.Section 5 summarises the whole paper.

Related work
In this section, we will present the latest approaches to power line detection from two perspectives: target-based detection and semantic segmentation-based transmission line detection methods.
Deep learning techniques with edge deployment capabilities offer new possibilities for power detection.Depending on the techniques used, they can be categorised into target detection-based power detection methods, codec-based semantic segmentation power detection methods and generative adversarial network-based semantic segmentation detection algorithms.

Power detection methods based on target detection methods
Transmission line detection methods based on target detection techniques are mainly target detection through network frameworks such as YOLO [19][20][21], FasterRCNN [22] and SSD [23].In this type of methods, the target region is first selected by data labelling tools to obtain data labels.Then, the obtained dataset is divided according to a certain ratio to construct the training set and validation set, and then the training set is used as the input for training, so as to obtain the most suitable detection model.In the actual detection process, the detection results of such methods are mostly framed by regular rectangular boxes.Therefore, the detection techniques based on target detection algorithms do not absolutely fit the objects to be detected, so the detection results of the network are less accurate.Typical research results are as follows: Chen Binghuang et al. [24] proposed a fast insulator detection algorithm based on the fused FPN network model by improving the YOLOv3 network framework, with an accuracy of up to 90%; Na Kyung-Min et al. [25] improved the FasterRCNN network model, which effectively solved many problems in the process of railway pantograph breakage detection; Wang, Shanshan et al. [26] proposed a fast insulator detection algorithm based on improved FPN network model, aiming at the problems in the detection of overhead insulators, a detection algorithm based on the improved YOLOv4 is proposed, and the detection accuracy of the network model was improved by 36.4% compared with that before the improvement.The above results prove that the target detection algorithm is of some significance for electrical energy detection, but there are many shortcomings in the electrical energy detection algorithm based on the target detection algorithm.The most important problems are: large range of labelled frames and weak detection ability for fine-grained targets.

Power line detection method based on semantic segmentation
The power line detection algorithm based on semantic segmentation is quite different from the target detection algorithm, and the main differences are as follows: firstly, the semantic segmentation label can be used for data labelling along the object to be detected when the data is labelled, so the range of the detection area is more accurate; secondly, the semantic segmentation is for the pixel-level detection, so it also has a stronger detection ability for the local features of the target.Currently, the commonly used semantic segmentation detection algorithms can be divided into: the detection method based on the structure of generative adversarial network [27] and the detection method based on the structure of codec [28].Among them, the semantic segmentation detection algorithm based on generative adversarial network is mainly based on the game between discriminator and generator for semantic segmentation detection.The representative results include the following: Chen Wenxiang et al. [29] proposed InsulatorGAN network model for power detection and InsuGenSet dataset for power detection; Wang Jingyu et al. [30] proposed the CPLD dataset as the research object and KCIGD generative adversarial network model for power detection; Rabab Abdelfattah [31] proposed PLGAN for power line detection and constructed a cross-domain power line detection model by using generative adversarial network.Although the semantic segmentation model based on generative adversarial network can be applied to the power detection task, due to the complexity of the model structure, such algorithms are more difficult to train, so the convergence effect is mostly unsatisfactory.In contrast, semantic segmentation based on encoding and decoding structures for detecting power line detection  [32].The average precision (AP) of grid detection could reach 40.06% by feature extraction and fusion from both infrared and visible light perspectives, which was obviously better than the performance of other models; Yang Lei et al. [33] proposed a PLE-Net-based power line segmentation detection model, which extracts and fuses features from both infrared and visible light angles, and the AP of grid detection can reach 40.06%, which is obviously better than the performance of other models.The AP (mean value) of grid detection reaches 40.06%, which is significantly better than that of other models.Hu Fanqui et al. [34] introduced a Gaussian model into the distribution of power equipment information and improved the performance of other models.
Comprehensively, it can be seen that the power line detection model with coding and decoding structure not only has low training difficulty, but also has the ability of fine detection, so this method is more meaningful in practical applications.

A real-time power line segmentation method based on multi-attention mechanism and strong semantic feature extractor
In this paragraph, the real-time power line segmentation method based on multi-attention mechanism and strong semantic feature extractor will be described in detail.
The DeepLab V3+ model is a typical coding and decoding structured semantic segmentation network model, including two parts: encoder and decoder.The DeepLab V3+ network model employs MobileNetV2 [35] and Atrous Spatial Pyramid Pooling (ASPP) feature fusion module for feature extraction and encoding in the encoder part, and fourfold upsampling and convolution module with two successive stacks in the decoder part.module for encoding.Compared with UNet [36], PSPNet [37], HRNet [38], etc., the DeepLab V3+ network model is less computationally intensive and therefore more suitable for fast detection.However, due to the oversimplified structure of the DeepLab V3+ network model, there are defects such as poor feature extraction ability and weak nonlinear fitting ability, which make it difficult to directly complete the complex power line segmentation detection task.Therefore, this paper proposes a fast power line segmentation detection method based on multi-attention mechanism and strong semantic information feature extractor, and the complete structure of the network is shown in Fig. 2.
The encoding part of the operation process is as follows: firstly, for the input image of 512×512× 3, the feature extraction is performed by multi-stage MobileNetV2 to get the feature map of size 256×256×16, 128×128×24, 64×64×32, etc., and then the feature map is globally encoded by the CA attention mechanism to get the feature map of size: 64×64× 320.Global encoding of the feature map is done to obtain a feature map of size: 64×64×320, followed by multi-scale feature fusion using the five-branch ASPPAttention structure to obtain a feature map of size: 256×256×64.
The decoder part is shown as Decoder in Fig. 2. Firstly, the information is quadruple upsampled, then the feature map obtained after encoding is fused by CA attention mechanism, the inverted bottleneck convolution block structure is used for further feature extraction, and then quadruple upsampling is carried out again to obtain the feature map with the size of 130×130× 7. Subsequently, to further strengthen the model's attention to the locally valid information, the CA attention mechanism is introduced before the prediction layer,to enhance the model's focus on deep semantic information.

Improved MoblieNetV2 network model
The initial DeepLab V3+ network uses the Xception multi-branch network structure [39] for feature extraction.The Xception network model has more branches, and although it has strong feature extraction capability, the network runs slower and requires higher performance from the side-end devices.In addition, the Xception network model uses a large number of standard convolutions, which further limits the inference speed of the network.When using MobileNetV2 for feature extraction, the MobileNetV2 network model has fewer branches, so it reduces the model's dependence on the arithmetic power of the side-end devices, and in addition the MobileNetV2 basic unit uses a large number of depth-separated convolutions, so it can effectively reduce the spatial complexity and the number of parameters of the model.Comparison of the computational volume of standard convolution and depth-separable convolution is shown in Eq. 1.
where G DW represents the floating-point operation of depth- separable convolution, G C represents the floating-point operation of conventional convolution, C in represents the number of input channels, C out represents the number of output channels, H represents the height of the feature map, W represents the width of the feature map, and K represents the size of the convolution kernel.From Eq. 1, it can be seen that the complexity of the model is greatly reduced when deep separable convolution is used.Therefore, the Mobile-NetV2 network model can effectively reduce the calculation amount of the model and accelerate the detection speed of the model.However, if the MobileNetV2 network model is directly adopted, it cannot complete the task of power line segmentation very well, and the main reasons are as follows.(1) The ReLU6 activation function is used in the basic unit of the MobileNetV2, which has poorer mapping ability in the 3D space.(2) The MobileNetV2 network has poorer interaction ability between various downscaling stages of the network.The mapping ability of this function in 3D space is poor.(3) The interaction ability of context information between the stages of MobileNetV2 network is poor; to address the above problems, this paper makes the following two improvements to the MobileNetV2 network: (1) FReLU [40] activation function ReLU6 was used as the activation function in the original MobileNetV2 base unit module, and the calculation formula is shown in Eq. 2.
As can be seen in Eq. 2, there is an obvious truncation of this activation function in the mapping process, so in the mapping process, for irregular edges, there has been a strong fitting segmentation, so there is a certain lack of segmentation ability.The FReLU activation function takes into account the relationship between the surrounding pixel points and the target pixel points during the spatial mapping process and, by changing the funnel function x c,i,j , the the edge features of arbitrary shapes can be theoretically preserved completely and effectively.The formula of the FReLU activation function is shown in Eq. 3. A comparison of the mapping relationship between FReLU and ReLU6 activation functions is shown in Fig. 3. where f x c,i,j denotes the output of the activation function for a predefined functor condition, and x c,i,j denotes the win- dow on the cth channel, centred on 2D position (i,j); p w c denotes the parameters shared by this window in the same channel, ReLU6 denotes the output of the ReLU6 activation function and x represents the input.
(2) CBAM attention mechanism In neural networks, convolution and downsampling are continuously performed to mine valid information at different depths.During the feature extraction process, the size of the feature map obtained by the network is constantly halved and the number of channels is constantly doubled, thus continuously mining from shallow positional information to deep semantic information.Since the size of the feature graph is continuously reduced during the feature extraction process, a semantic gap may exist between contextual information interactions at each stage of the network.To avoid the above problems, this paper proposes to introduce CBAM attention mechanism in each feature extraction module in the network, and at the same time enhance the interaction of contextual information between modules from the perspective of channel and space.The structure of the CBAM network model is shown in Fig. 4.
In summary, the structure of the improved MobileNetV2 basic module is shown in Fig. 5 and for different steps of the module can be calculated using Eqs.3-4.
The improved mobileNetv2 module can be calculated using Eqs.4-5.
where CBAM stands for CBAM attention mechanism operation; FReLU stands for FReLU activation function operation; C stands for convolution operation; DW stands for depth separable convolution operation.

ASPPAttention structure
With the deepening of the MobileNetV2 network model, the information obtained from the network model becomes more and more abstract, and when the deepest layer of the network is reached, the information of the network feature map reaches the most abstract and the number of channels reaches the maximum.Therefore, to avoid the interference of invalid information on the detection results, this paper proposes to put the CA attention mechanism on the front input of the ASPP structure.For the input part, the pixel fine-grained encoding property of the CA attention mechanism is firstly utilised to enhance the model's attention to the deep fine-grained valid semantic information; then the improved ASPP module is used for feature fusion.By adding one layer of 1x1 convolution in each of the five branches of ASPP, the nonlinear fitting ability of the model is improved.In addition, since the information between each branch in the original ASPP structure is independently protected and does not have the ability to communicate information between groups, the contribution of each group of features to the network is not equal, thus limiting the performance of the network.To improve the performance of the model, the channel mixing and cleaning mechanism is added before the fusion of the five branches to establish the communication between the features of each group.The structure of the ASPP attentional model is shown in Fig. 6.

Strong semantic feature extraction decoder
The original DeepLab V3+ network model decoder decouples the semantic information as well as the shallow positional information by two consecutive fourfold upsampling to obtain the semantic information as well as the shallow positional information, respectively, followed by direct detection.Therefore, the structure of the original decoder is relatively simple and the nonlinear expression ability is weak.To improve the nonlinear expression ability of the model, this paper proposes a strong semantic feature extraction decoder with learning ability, which deepens the nonlinear expression ability of the model by introducing an inverted bottleneck convolution between two fourfold upsampling modules.The operation process of inverted bottleneck convolution is shown in Table 1.
From Table 1, it can be seen that the inverted bottleneck convolution for the input h ×w× k feature map is first upscaled to get the feature map with the size of h ×w×kt, and then the 1 × 1 convolution is used for downscaling to get the feature map with the size of h ×w× k.This operation increases the sensory field, and therefore better feature extraction can be achieved.The introduction of CA attention mechanism before the output effectively improves the model's attention to locally valid information in the decoding process.The calculation formula is shown in Eq. 6: (6)

Introduction to the experimental dataset and training strategy
The TTPLA dataset was used during the experiment and the number of data labels statistics is shown in Table 2.
From Table 2, it can be seen that there are five categories in the TTPLA dataset, in which the number of cable labels is as high as 8,083, the number of tower_lattic labels is 330, the number of tower_tucohy is 168, the number of tower_ wooden is 283, and the number of voids is 173; from this, it can be seen that there are two major deficiencies in the TTPLA dataset, namely, the sample label distribution is not uniform and the number of samples is insufficient.Therefore, to avoid the limitation of the data distribution imbalance on the model performance during the training process, this paper proposes to adopt two main learning strategies: migration learning and dice loss, by migrating the pre-training weights of the VOC dataset to improve the generalisation of the model and with the help of dice loss to alleviate the limitations of the model effect caused by the sample imbalance.The formula of Dice Loss is shown in Eq. 7.
In Eq. 7, DiceLoss represents the output of Dice Loss; Dice is the Dice coefficient; t i represents the target value, y i repre- sents the predicted value of the network, and represents the fine-tuning factor to avoid the problem of ineffective gradient explosion of the values.Therefore, from Eq. 7, Dice Loss can be effectively balanced by weighting the predicted and actual values and therefore can effectively avoid the impact of sample imbalance on the model performance.

Comparison of evaluation indicators and results
To test the performance of the model comprehensively, multi-class evaluation indicators were used, including: mean intersection ratio (mIoU), mean pixel accuracy (MPA), and accuracy (Acc) for multi-indicator evaluation, and the calculation formula is shown in Eqs.8-10.
In Eqs.8-10: where k+1 is the number of categories;p ii is the number of pixels correctly classified; p ij is the number of pixels where class i is predicted to be class j;p ji is the number of pixels where class j is predicted to be class i; TPindicates the number of power patrol targets detected by the network and classified accurately; FN indicates the number of objects belonging to the detected power targets, but not correctly detected;FP indicates the number that is misclassified.The number of background areas that are incorrectly classified as power patrol, where Acc, mIoU and MPA are used to verify the detection performance of the model.( 7) To verify the strengths and weaknesses of the algorithm in this paper, the model is compared and analysed with the current mainstream semantic segmentation detection model, and the results are shown in Table 3.The actual detection results are visualised, as shown in Fig. 7.
Combining Table 3, Fig. 7 and related literature, it can be seen that ( 1) compared with HRNet, the mIoU of this paper's model is improved by 30% and the MPA is improved by 39.37%.The main reason is that HRNet uses a single semantic feature extraction for detection, although it increases the interaction of each feature layer, but compared with the feature fusion structure, the model is still insufficient in global semantic extraction.There is no constraint on the network model, leading to the detection speed of the network being only 18.5FPS, which is 22.3FPS slower than that of this paper's model.(2) PSPNet mainly uses the global pooling structure to obtain multi-scale effective information, and then directly carry out detection.Compared with the model structure in this paper, the model is mainly equivalent to only applying the encoder structure, so it can be faster in feature extraction.Although feature extraction and semantic segmentation can be performed faster, the network performance is poorer, so PSPNet fails to balance the model speed and performance better than this paper's network.(3) Compared with UNet using VGG16, the mIoU of this paper's model is improved by 29.4%, which is mainly due to the lack of feature fusion structure in the UNet model network model, and therefore the global feature fusion structure is neglected This is mainly due to the fact that the UNet model lacks the feature fusion structure, so the influence of the global feature fusion structure on the model performance is ignored.So, the model in this paper has an obvious advantage over the UNet codec model.When the ResNet network is used for feature extraction, UNet is enhanced compared to the VGG16 network.Therefore, although the segmentation ability of UNet for power lines is improved when using ResNet, there is still a significant difference between the performance of UNet and that of our network.Moreover, due to the higher network complexity of VGG16 and ResNet50, the detection speed of the model is slower, in which the detection speed of UNet with VGG16 as the encoder network is only 19FPS.Therefore, combining the above experimental results, it can be seen that the model proposed in this paper can take into  account both the model speed and the performance in the power line segmentation detection in a better way.

Ablation experiments
The main improvements of the real-time power patrol model based on multiple attention mechanisms and strong semantic feature extractors are the introduction of the CBAM attention mechanism in MobileNetV2 (change1), the FReLU activation function in MobileNetV2 (change2), the ASP-PAttention attention mechanism (change 3), strong semantic feature extraction decoder (change4), Dice Loss (change5), and migration learning (change6).The results of the ablation experiments are shown in Table 3, where " ✓ " indicates that the improvement strategy was used.
Analysis of Table 4 shows that: firstly, the mIoU of the model increased from 36.5 to 42.1% and the MPA increased from 38.8 to 40.6 % after the introduction of the FReLU activation function, indicating that the segmentation capability of the model has been improved.The actual detection results of the visualisation are shown in Fig. 8, where Fig. 8b shows the detection results of the original DeeplabV3+ model, and Fig. 8c shows the actual detection graph after the introduction of the FReLU activation function.It can be seen that after the introduction of the FReLU activation function, the false detection of the model has improved, but the model's attention to the local information of the target is poor, leading to a large number of pixel points being missed in the actual segmentation detection of the model.(2) Therefore, to improve the model's attention to the pixel points, the model in the encoder part of the MobileNetV2 structure that CBAM attention mechanism is introduced, as shown in Table 3 and Fig. 8.By introducing the CBAM attention mechanism, the model's ability to extract approximate features is enhanced; therefore, the MPA of the model is improved by 1.8%.Because of the weak decoder capability of the model, as shown in Fig. 8d, the model has certain false detection cases, but for the object to be detected lead segmentation performance is significantly better than before the introduction of the CBAM attention mechanism.(3) After adopting the ASPPAttention enhanced multi-scale feature fusion module, the model's ability to mine finegrained features is further improved, so the model's mIoU and MPA are all improved to some extent, as well as the model's ability to extract fine-grained features.Therefore, as shown in Fig. 8d, the segmentation capability of the model for the detected target is further enhanced.( 5) With the help of a strong decoder with learning capability, the model for the encoder mitigates the situation of encoder mis-extraction of pixel points.(6) After applying Dice Loss, the detection performance of the model is further improved due to the solution of the sample imbalance problem.The above ablation experiments demonstrate that the improved model in this paper can effectively solve many problems in the process of power patrol.

Failure case analysis
Although the real-time power line segmentation detection algorithm based on multi-attention mechanism with strong semantic feature extractor proposed in this paper has some advantages compared to the existing research results, there are some limitations in the performance of the model in some specific highly robust scenarios.As shown in Fig. 9, where in Fig. 9, due to certain similarities between tree branches and power lines in terms of colour, shape, etc., a portion of the tree branches is incorrectly classified as wires as shown in Fig. 9a and b; furthermore, in the portion with high interference, as shown in Fig. 9c and d, this portion of the power line is subject to solar interference during the data acquisition process, which results in the emergence of an incomplete segmentation scenario.A complete segmentation situation occurs.For the above failures, the following solutions can be used: for the cases where the model is misclassified due to too much similarity, resulting in, for example, branches, the image morphology technique can be used to remove them; for the incomplete segmented image, although the overall structure of the power line is extracted from the image, some local power lines are missing in the segmented image.Therefore, approximate fitting or generative adversarial network methods can be used to repair the missing regions.Therefore, the segmentation accuracy of the  proposed method still needs to be further improved to adapt to more complex scenarios.

Conclusion and outlook
Aiming at the problems of high complexity, slow detection speed and poor detection effect of the existing power line detection methods, this paper proposes a real-time segmentation and detection method for power lines based on multiattention mechanism and strong semantic feature extractor.The DeepLabV3+ encoder is used as the base model for improvement.Among them, the encoder part adopts MobileNetV2 and ASPPAttention structure with a small number of parameters, which achieves the fast extraction of multi-scale features.In the decoder part, a strong semantic feature extraction decoder based on the inverted bottleneck convolution structure is proposed, which deepens the depth of the network and improves the model's nonlinear expression ability by introducing the inverted bottleneck convolution, by migrating the pre-training weights and using Dice Loss as the loss function as two training strategies for model iteration.The trained model has an mIoU of 47.80% and an accuracy of 98.50%, which is significantly better than those of other network models for segmentation of power lines.Meanwhile, the model has an additive measurement speed of 40.80 FPS, which is a better balance between the model's detection speed and the detection accuracy.Subsequent research will focus on collecting more images, further expanding the dataset, improving the generalisation ability of the model, the deployment of side-end devices and the design of the host interface.
Author Contributions All authors made significant contributions to this manuscript.TJ and QZ wrote the main text of the manuscript,;SL, WY and CY drew all the figures, and all authors reviewed the manuscript.

Fig. 1
Fig. 1 Examples of sample images from the TTPLA dataset

Fig. 2
Fig. 2 Structure of real-time power line detection model based on multi-attention with strong semantic feature extractor

Fig. 5 Fig. 6
Fig. 5 Improved MobileNetV2 basic unit module where figure a shows the basic module with a step size of 1 and figure b shows the basic module with a step size of 2

Fig. 7
Fig. 7 Comparison chart of multi-model segmentation detection results

Fig. 8 Fig. 9
Fig.8The effect of each strategy of the ablation experiment on the detection results; a is the image to be detected, b is the detection result before the improvement, c is the detection result after the

Table 1
Inverted bottleneck convolution structure tablewhere y i represents the output,CA represents the CA atten- tion mechanism operation, U represents the fourfold upsampling, I represents the inverted bottleneck convolution operation, and Z i represents the output of ASPPAttention.x i represents the output of the deepest information of the backbone network.The image input size is adjusted to 512×512, the initial learning rate is set to 5e-4, and the Adam optimiser is used for pre-training.Firstly, 100 rounds are trained on the VOC dataset to get the pretraining weights, then the pre-training weights are solidified, the learning rate is set to 0.001, and fine-tuning training is carried out to get the final results.

Table 3
Multi-model performance comparison tableThe bold indicates represent the optimal values of the different modelsfor the changed metrics

Table 4
Ablation experiment comparisonThe bold indicates represent the optimal values of the different modelsfor the changed metrics