YOLOOD: an arbitrary-oriented flexible flat cable detection method in robotic assembly

Flexible flat cable (FFC) detection is the premise of robot 3C assembly and is challenging because FFCs are often non-axis aligned with arbitrary orientations having cluttered surroundings. However, to date, the traditional robotic object detection methods mainly regress the object horizontal bounding box, in which the size and aspect ratios do not reflect the actual shape of the target object and hardly separate the FFCs in dense. In this paper, rotated object detection was introduced into FFC detection, and a YOLO-based arbitrary-oriented FFC detection method named YOLOOD was proposed. Firstly, oriented bounding boxes are used to reflect the object’s physical size and angle information and better separate the FFCs from the dense background. Secondly, the circular smooth label angular classification algorithm is adopted to obtain the angle information of FFCs. Finally, the head point regression branch is introduced to distinguish between the head and the tail of the FFC and expand the range of FFC detection angle to 0∘,360∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left[ 0^{\circ }, 360^{\circ }\right) $$\end{document}. The proposed YOLOOD can reach the detection performance with an average precision of 90.82% and a detection speed of 112 FPS on an FFC dataset. Meanwhile, an actual FFC grasping experiment demonstrated the proposed YOLOOD’s effectiveness and feasibility in practical assembly scenarios. In conclusion, this paper innovatively introduces rotating object detection into robot object detection, and the proposed YOLOOD solves the problem of detecting and locating non-axis aligned FFCs and has particular significance for robot 3C assembly.


Introduction
The last decade has witnessed a rapid development of information technology, which continuously increased the demand for 3C products, especially mobile phones. At the same time, the automatic assembly of mobile phones has aroused the attention of researchers to reduce labor costs. Flexible flat cable (FFC), as one of the critical components of mobile phones to connect hardware and transfer data, needs to be accurately detected, located, grasped, and installed. However, 3C product assembly lines are mainly designed for manual assembly [1], which makes the fact that the FFCs to be assembled are often placed randomly on the convey or belt to transfer. The rapid development of computer vision has made it possible to apply visual technology to automatically detect and locate FFC, so as to replace manual operation and improve the efficiency of operation. Therefore, it is of great significance to carry out research on detection and positioning of FFC in robotic assembly.
For object detection and localization in robotic assembly, traditional methods rely mainly on handcrafted features [2], such as SIFT [3], HOG [4], and LBP [5]. However, most traditional methods are not robust to changes in the object's size, shape, illumination, position, and background. When these conditions change, the performance of the detector decreases. In recent years, advancements in deep learning have shown great power in visual perception tasks [6][7][8][9][10]. Meanwhile, more researchers have adopted deep learning algorithms to detect and locate objects in robotic assembly tasks [11]. The deep learning algorithm can adaptively acquire the characteristics of the object and is robust to the changes in the complex environment of robotic assembly. Most object detection methods based on deep learning follow two kinds of paradigms [12]: RCNN [10] and YOLO [8]. The difference is whether there is an explicit region proposal generation process.
[13] introduced the faster and stronger architecture YOLOv5, which achieves a balance of speed and precision that fits the assembly scene. However, the above deep learning methods locate objects through horizontal bounding boxes (HBB), as shown in Fig. 1a. When facing objects with arbitrary angle information, such as FFCs, the size and aspect ratio of HBB cannot reflect the actual shape of the target object. Furthermore, HBB cannot distinguish objects from background pixels well when objects are dense. Fortunately, the rotated object detection methods with oriented bounding boxes (OBB) can solve the above problems well, as shown in Fig. 1b.
Recently, some progress has been made in arbitrary-oriented object detection in aerial imagery and scene text [14][15][16]. However, few people have applied it in object detection in robot assembly. In this paper, we introduce arbitrary-oriented object detection into object detection in FFC assembly. Meanwhile, we aimed to solve the problem of arbitrary FFC direction and dense background during robot assembly. To this end, a YOLO-based method for arbitrary-oriented FFC detection named YOLOOD was proposed. Specifically, YOLOOD uses YOLOv5 as the backbone for efficiency and accuracy. Then, three-scale fused features are sent to the head network to predict the oriented bounding box. In contrast with the conventional orientation regression approach, we introduced CSL [17] to detect the angle of FFC, which turned the angle regression problem into a classification problem and addressed the periodicity problem of the angle. Moreover, a headpoint detection branch was proposed, allowing us to extend the scope of the angle to [0 • , 360 • ) and distinguish the head and tail of the FFC. In general, the main contributions of this paper are summarized as follows: (1) We innovatively introduce the arbitrary-oriented object detection method into the FFC detection task, aiming to accurately detect the FFC at any angle and reduce the interference of background information. Specifically, a YOLO-based method named YOLOOD for arbitrary-oriented FFC detection and localization was proposed. The size and aspect ratios do not reflect the actual shape of the target object, and dense objects are challenging to be separated, b OBB: The width and height of OBB reflect the physical size of the objects, and OBB can efficiently separate dense objects (2) CSL was introduced to determine FFC angles, which transforms the problem of angle regression into classification. It helps us solve the problem of discontinuous boundary in angle regression and accurately locate FFC. (3) A head-point regression branch was added to the head networks, which allows us to determine the angle of FFC in the range of [0 • , 360 • ) so that we can distinguish the head and tail of the FFC for subsequent robot operations. (4) The proposed YOLOOD can reach the detection performance with an average precision of 90.82% and a detection speed of 112 FPS on the FFC dataset. The high accuracy and efficiency achieved enable tackling low-tolerance assembly tasks downstream: an industrial robot can complete an FFC grasping task well with our network.
The paper is organized as follows. In Sect. 2, we briefly review the related work. In Sect. 3, we introduced the representation of a bounding box of a rotated object. Then, we introduce the proposed YOLOOD in detail in Sect. 4. Experimental details and discussion are shown in Sect. 5. Finally, we conclude the paper in Sect. 6.

Related work
This section mainly discusses the related work in horizontal object detection, arbitrary-oriented object detection, and object detection for robotics.

Horizontal object detection
In recent years, object detection methods based on deep convolutional neural networks have significantly improved [18]. These object detection methods generally locate objects with horizontal bounding boxes, as shown in Fig. 1a. Existing methods can be roughly split into single-stage and two-stage detectors based on detection paradigms. Two-stage detectors (e.g., RCNN [10], Fast R-CNN [19], Faster R-CNN [9], Mask R-CNN [6], R-FCN [20]) first generate region proposals and then extract features from the region proposals for classification and localization. Twostage detectors generally achieve high accuracy but have a slight decrease in speed. On the contrary, there is no explicit region proposals generation step in singlestage detectors (e.g., YOLO [8], SSD [21], RetinaNet [22]). They directly perform object classification and position regression on the feature map. Because of their one-stage structure, single-stage detectors tend to be more time-efficient. Different from the above anchor-based detectors, many anchor-free detectors (e.g., CornerNet [23], CenterNet [24], and FCOS [25]) have aroused the attention of researchers in recent years. Anchor-free detectors try to predict the object's key points, such as the center or the corner points, and then group the key points to get the object bounding box. The above methods have achieved significant performance in general object detection. However, when the detected object has a high aspect ratio, the horizontal bounding box cannot reflect the physical size of the object, and different objects 1 3 YOLOOD: an arbitrary-oriented flexible flat cable detection… cannot be distinguished from the background in dense scenes, so it is not appropriate to directly apply horizontal object detection to the FFC detection task.

Arbitrary-oriented object detection
Arbitrary-oriented object detection plays a crucial role in remote sensing, scene text, and face detection tasks. However, few research has introduced rotated object detection to robotic assembly. Unlike horizontal object detection, oriented object detectors generally use rotated bounding boxes to represent the object, as shown in Fig. 1b [29] proposed a general and straightforward network to generate oriented proposals, significantly improving computational efficiency. The above methods treat angle as a regression problem but have the angle periodicity problem. O 2 D-Net [30] and X-LineNet [31] utilized anchor-free arbitrary-oriented detectors to predict a pair of intersecting lines. However, these methods are inferior to the anchor-based method in performance. Recently, CSL [17] transformed angle regression into a classification problem to avoid the problem of discontinuous boundaries. Nevertheless, these methods detect angles ranging from [0 • , 180 • ) that cannot distinguish the head and tail of the FFC. Moreover, these methods suffer inefficiently and cannot be directly applied to FFC detection in robot assembly.

Object detection for robotics
Much research into robot object detection has focused on improving robustness and precision in particular scenarios. [32][33][34][35] investigated object detection for robotics in clutter scenes. One of the main motivations is helping robots pick objects in a cluttered bin. But none of them focused on the rotated object detection. [36,37] aim to develop a fast adaptation method to new tasks while maintaining high detection accuracy. Another field related to robot object detection is robot active perception [38,39]. They emphasized the need to improve object detection and recognition performance by exploiting the robot's active exploration of the environment. In this paper, instead, we attempt to propose a pipeline for arbitrary-oriented object detection and localization in FFC Assembly, and detection accuracy and time efficiency are considered. We aim to detect FFCs at any angle and distinguish them from the dense background as much as possible. To the best of our knowledge, this is the first work that oriented object detection introduced into robotic FFC assembly tasks.

Problem formulation
At present, there are two alternative methods to represent the bounding box of a rotated object, including five-parameter methods [14,15,17,26,27,40] and eight-parameter methods [41,42]. Five-parameter method adds an additional angle parameter to the horizontal box definition (x, y, w, h). There are two different methods to define the angle range of , OpenCV [43] and long side definition. OpenCV specifies that the acute angle between the rectangle and the x-axis is , ranging from [0 • , 90 • ) , and the other side is defined as w. On the contrary, long side definition (x, y, w, h, ) with 180 • angular range states that refers to the angle between the long side (h) of the rectangle and the x-axis. On the other hand, the eight-parameter method represents the rotated rectangle by vertex coordinates (x 1 , y 1 , x 2 , y 2 , x 3 , y 3 , x 4 , y 4 ) . However, none of the previous methods can distinguish the head and tail of the rotated rectangle. In this paper, long side definition is used to describe the FFC bounding box and the head point position parameter is introduced to expand the angle range to [0 • , 360 • ) . Our representation method contains seven dimensions, as follows: where (x, y) is the center of the rectangle, h is the long side of the rectangle, and w is the other side. refers to the angle between the h and the x-axis. (x h , y h ) are the coordinates of the head point. Figure 2a shows an example of this 7-parameter representation. Using the position relationship between the head and center points, we can expand the detection range of the FFC angle to [0 • , 360 • ) , as shown in Fig. 3.   Fig. 3b. When the head point is located in the first quadrant of the center point, the angle value corresponding to the 360-degree notation is ; when the head point is located in the third quadrant of the center point, the angle value corresponding to the 360-degree notation is + 90 • . Similarly, in the case of ∈ [−90 • , 0 • ) , for two different head and center position relationships, the angle corresponding to the 360-degree notation is + 360 • and + 180 • , respectively. Then, we can reduce the 7-parameter representation to a 5-parameter representation (x, y, w, h, ) , where represents the counterclockwise angle between the positive x-axis and the head of the FFC, ranging from 0 to 360 • , as shown in Fig. 2b.

Method
In this paper, an arbitrary-oriented FFC detector based on the YOLOv5 network using angular classification and head-point detection was proposed. Figure 4 shows the overall architecture of the proposed network. The embodiment is a one-stage rotation detector based on the YOLOv5. Moreover, Fig. 4 shows a multi-tasking pipeline, including the bounding box prediction branch, angular classification branch, and head-point regression branch, to achieve arbitrary-oriented FFC detection and localization in robotic assembly. We elaborate on the details of the proposed framework in the following sections.

Basic detection network (YOLOv5)
YOLOv5 is an improved version based on YOLOv4 [44] proposed by ultralytics. It is a single-stage detector that balances the need for accuracy and speed. Compared to previous versions, YOLOv5 improved detection accuracy through network structure improvement and strategy optimization. Meanwhile, YOLOv5 achieved real-time performance with a smaller structure. The characteristics of high precision and efficiency fit the requirements of robotic object detection, so we choose YOLOv5 as a base detector in this article.
As shown in Fig. 4, we first send the preprocessed FFC image to Cross Stage Partial Networks (CSPDarknet53) structure for feature extraction; then; five different feature maps (C1, C2, C3, C4, C5) can be obtained. Then, feature pyramid attention network [45] (FPN) and path aggregation network [46] (PANet) structures are applied to enhance the ability of network feature fusion. Its "shortcut" connection, up-bottom path, and bottom-up path structure can better extract, fuse, and preserve multi-scale features, which benefits robotic object detection on different scales. Finally, the feature maps of three scales (80×80, 40×40, 20×20) are sent to the prediction network for the following multi-task prediction. The final prediction consists of bounding box prediction, angular classification, and headpoint regression. We will cover them in more detail below.

Target prediction network
After feeding the image into the backbone network for feature extraction and aggregation, we transmit feature to the target prediction network. In order to realize the robust detection of different scale parts, we use three detection scales 20 × 20 , 40 × 40 , and 80 × 80 . For the feature map of each scale, we use the k-means algorithm to generate three prior anchors (9 anchors in total) boxes, which have different scales to adapt to the size of objects in the training set.
Taking the 80× 80 detection scale as an example, we divide the feature map into 80× 80 grids. Each grid point is preset with three anchor boxes of corresponding scales. For each grid point, it outputs bounding boxes regression (x of fset , y of fset , w, h) , confidence, object classification, CSL-based angular classification, and head-point regression (x h of fset , y h of fset ) . Finally, we can obtain the most reliable detection results through confidence evaluation and a non-maximum suppression algorithm [47].

Arbitrary-oriented angle prediction based on the angular classification
In the assembly process, the initial position and orientation of the FFC are arbitrary. As a consequence, it is necessary to detect angle parameter of FFC for robotic operation. Mainstream rotation detectors are based on region regression [15,16,27,28]. However, these methods essentially suffer the problem of discontinuous boundaries, which makes the model's loss value at the boundary suddenly increase so that the model cannot obtain the prediction result most simply and directly. To avoid this boundary discontinuity problem, we applied Circular Smooth Label (CSL) to transform the angle regression problem into a classification problem, as shown in Fig. 5.
CSL encodes the angle information in the following form: where g(x) represents the window function, represents the angle passed by the longest side when the x-axis rotates clockwise, ranging of [−90 • , 90 • ) , r represents the window radius, x represents the argument for angle classification, ranging from [−90 • , 90 • ) . Each degree of x corresponds to a category, 180 categories in total. If one-hot coding is adopted rather than the window function, assuming the ground truth is 0 • , the predicted results of the classifier are 1 and 89 • , respectively, then their predicted losses will be the same. However, from the detection perspective, the predicted results close to the ground truth should be allowed. Therefore, CSL with a window function to assign smoother values to labels is introduced, as shown in Fig. 5. An ideal window function g(x) is required to have the following attributes [17]: Periodicity g(x) = g(x + 180 * k), k ∈ N. 180 represents the number of angles divided.
The function presents a monotonous non-increasing trend from the center point to both sides.
For example, if the ground truth is 10 • , then CSL(10) = 1, CSL(9) = CSL(11) = 0.7 , and the assigned label value is smooth with a certain tolerance. Besides, 89 • and −90 • become neighbors, solving the angular periodicity problem. We choose the Gaussian function as the window function to measure the distance between the predicted and ground truth. The closer the predicted value is to the ground truth, the smaller the loss value. In summary, we use the CSL classification algorithm to avoid the periodicity caused by angle regression and achieve robust angle prediction for FFC detection.

Head-point regression
In the aforementioned chapter, we used YOLOv5 network in combination with the CSL angular classification algorithm to realize arbitrary-oriented FFC detection. So far, we can only get rotated bounding boxes of FFC and cannot tell its head from the tail. The natural idea is to add a head point regression branch into our network to distinguish the orientation. With the position relationship between the head point and center point, we can expand the detection range of FFC angle to [0 • , 360 • ) , as shown in Fig. 3.
Traditionally, L1, L2, and smooth-L1 loss functions are utilized in head-point regression. However, one limitation of these loss functions is that they are not sensitive to minor errors. We employed Wing-loss [48] to avoid this problem, which is calculated by: where the non-negative w sets the range of the nonlinear part to (−w, w) , limits the curvature of the nonlinear region, and C = w − w ln(1 + w∕ ) is a constant that smoothly links the piecewise-defined linear and nonlinear parts. The loss functions for head point regression are defined as follows: where s = x h , y h represents the head point vector, s � = x � h , y � h represents the ground truth, i represents i th head point.

Multi-task loss function
Our multi-task arbitrary-oriented FFC detection pipeline contains a bounding box regression branch, CSL-based angular classification branch, and head-point regression branch. The total loss function is: where L cls , L obj , L cls , L ang and L hhead denote the bounding box regression loss, confidence loss, object classification loss, angular classification loss, and head-point regression loss, respectively. 1 , 2 , 3 , 4 , 5 are hyper-parameter to control the trade-off.
Since we converted the angle prediction from a regression problem to a classification problem, which decouples the angle information from the bounding box parameter, this avoids using a non-differentiable rotation intersection over union (IoU) to optimize the bounding box regression branch. We adopted CIou_Loss [49] for L reg , which is calculated as: where b represents the predicted box, b gt represents the ground truth box, b ∩ b gt represents the intersection area, b ∪ b gt represents the union area. 2 (b, b gt ) represents the Euclidean distance between the center point of the predicted bounding box and the ground truth box. c represents the diagonal distance of the smallest rectangle that can cover the predicted bounding box and the ground truth box. h gt and w gt are the length and width of ground truth box, respectively; h and w are the length and width of the prediction box, respectively. CIou_Loss brings the predicted bounding box more in line with the real bounding box and improves detection performance. L reg then is calculated as: where N p represents the thickness of predict layer, P reg ∈ R N t ×(x c ,y c ,w,h) represents the predicted bounding boxes, N t represents the number of ground truth, T reg ∈ R N t ×(x c ,y c ,w,h) represents the ground truth boxes, (x c , y c , w, h) represents the center point coordinates, width, and height of the bounding box. L obj , L cls , L ang are all calculated with binary cross-entropy (BCE) logits loss: where P obj ∈ R N a ×W i ×H i denotes the predicted confidence, N a represents the number of anchor box, W i , H i represents the width and height of feature maps in predict layer, respectively. T obj ∈ R N a ×W i ×H i denotes the true confidence, calculated by the CIOU between the predicted bounding box and the ground truth box.
where P cls ∈ R N t ×N c represents the predicted probability distribution of various classes, N c represents the number of categories, T cls ∈ R N t ×N c is the probability distribution of ground truth.
L head is calculated with Wing Loss described in Sect. 4.4: where P reg ∈ R N t ×(x h ,y h ) represents the predicted coordinates of the head point, T reg ∈ R N t ×(x h ,y h ) represents the ground truth coordinates of the head point.

FFC Dataset
Hundreds of FFC images in different positions and orientations were collected under various backgrounds and illumination. In addition, the density and scale of FFCs also change. Then, they were resized to 640 × 640 to meet the model requirements. We used LabelMe tools to label the oriented bounding boxes and head points of the FFCs in the dataset. Data augmentation such as vertical and horizontal flips, rotations, and Mosaic augmentation is applied to reduce the risk of overfitting. We divided the dataset into training and test sets in a 4 : 1 ratio.

DOTA [50]
DOTA is one of the largest datasets of aerial imagery, containing 2,806 images taken from different sensors and platforms, ranging in size from 800 × 800 to 4000 × 4000 pixels. DOTA benchmark contains 15

HRSC2016 [51]
HRSC2016 contains a large collection of ship images from several prominent ports, totaling 1061 images ranging from 300 × 300 to 1500 × 900 . The training set contains 436 images, the validation set is made of 181 images, and testing set has 444 images. We crop the training and validation images into 768 × 768 patches with an overlap of 200 pixels for training.

Implementation details
All the experiments are implemented on PyTorch 1.9 deep learning framework and python 3.8.0 on a PC with two NVIDIA GeForce RTX 3080 GPUs with 16GB of RAM. The computer operating system is Ubuntu 20.04. The stochastic Gradient Descent (SGD) algorithm was employed to optimize the network, where the momentum, weight decay coefficients, and initial learning rate were set to 0.937, 0.0005, and 0.0001, respectively. According to the official training results on the coco dataset, 1 , 2 , 3 are set to 0.05, 1.0, 0.5, respectively. A couple of experiments are performed to choose the appropriate values of 4 , 5 in Table 2. We used K-mean and genetic learning algorithms to automatically generate anchor size and carried out flipping, rotation, mosaic, and multi-scale data argument. We chose the YOLOv5m model as the base model and loaded the pre-trained model on the coco dataset. We set the batch size to 8 and epochs to 300.

Evaluation metrics
We mainly leverage the IoU and the average precision (AP) to verify the performance of the proposed arbitrary-oriented detector. Furthermore, we apply Frame Per Second (FPS) to evaluate the detection efficiency of the proposed method. The evaluation approach in DOTA [50] was adopted to get the IoU between oriented boxes. If the IoU between a ground truth box and a predicted box exceeds a certain threshold, the predict box is marked as true-positive (TP), otherwise false-positive (FP). If a ground truth box is not matched, it is marked as a false negative (FN).
The definition of precision and recall are as follows: The AP is the average value of precision in different threshold recall rates, which is defined as follows: (11) Pr ecision = TP TP + FP where P represents precision and R represents recall. Moreover, AP@0.5:0.95 represents the average AP at different IOU thresholds (from 0.5 to 0.95, step 0.05). The PASCAL VOC2007 metric was adopted to compute the AP in our experiments.
In addition, the FPS was used to verify the detection efficiency of different methods. FPS represents the number of images inferred by the network per second.

Experiments and results
A series of ablation experiments were conducted on the FFC dataset to examine the effectiveness of each component in the proposed framework. Furthermore, the proposed YOLOOD is compared with the State-of-the-art oriented object detectors to show the advantages in the FFC detection field. Public datasets such as DOTA and HRSC2016 were also used to evaluate YOLOOD's performance. Besides, we took advantage of Class Activation Mapping (CAM) [52] to visualize the feature map to prove the effectiveness of our method. Finally, FFC grasping experiments with different scenarios were performed to verify the practicability of the proposed YOLOOD.

Ablation study
Evaluation of different components A series of contrast experiments were conducted on the FFC dataset, as shown in Table 1. Different combinations of the two factors of the CSL angular classification and the head-point regression on the final experimental result were studied.
For Table 1, the first row is the baseline, without CSL (using one-hot label) and head-point regression. We set the first and second groups, and the third combination and the fourth group of experiments to verify the effectiveness of CSL. The results show that CSL angular classification has better performance than one-hot label. Specifically, using the CSL algorithm (group 2 and group4), AP@0.5:0.95 YOLOOD: an arbitrary-oriented flexible flat cable detection… increased by 37.33 and 36.80%, recall increased by 62.11 and 62.06%, and precision increased by 61.77 and 62.8%. The CSL algorithm can help the network perceive the distance between the prediction and ground truth angles to obtain a more robust angular prediction. On the other hand, we set the first and third groups, the second combination and the fourth group of experiments to analyze the head-point regression branch. We can observe from the experimental results of the third and fourth groups that AP@0.5:0.95 improved by 2.48 and 1.95%, recall improved by 2.99 and 2.94%, and precision improved by 2 and 3.03% with head-point regression. The head-point regression branch can not only distinguish the head and tail of objects but also improve detection accuracy. We suspect that adding the headpoint regression branch can facilitate the learning of bounding box regression and give the model a better perception of the object's overall size. Hence, we trained 300 epochs on the FFC dataset for different combinations of the CSL angle classification and the head-point regression and obtained their bounding box regression loss to confirm our conjecture, as shown in Fig. 6 where the x-axis represents the training epochs; the y-axis represents the value of the bounding box regression loss in the training process. The curves of the first and third groups and the second and fourth groups show that bounding box regression loss decreases more smoothly and converges more entirely after the head-point regression branch is added. The results confirm that the head-point regression branch enables the model to learn the FFC's  Table 2 gives the results of different settings on hyper-parameters in Eq. 5. We only compared the influences of different 4 , 5 parameter settings on the final experimental results, and 1 , 2 , 3 were set to 0.05, 1.0, 0.5, respectively, consistent with those obtained by YOOLOV5 model pre-trained on the coco dataset. The test results on FFC dataset with 7 different parameter settings show that the model achieves the best performance with 90.82% AP, 95.83% recall, and 95.88% precision when 4 = 0.5, 5 = 0.005.

Comparison with state-of-the-arts
Results on FFC dataset This section compares the proposed YOLOOD on FFC dataset with ten other existing arbitrary-oriented object detection methods: CSL [17], Gliding Vertex [42], Faster RCNN-O [9], R 3 Det [15], S 2 A-Net [28], KFIoU [53], SASM_Reppoints [54], Oriented R-CNN [29], ReDet [55], ROI-Trans [27]. We adjusted some of the hyperparameters to fit the FFC dataset scenario. For the anchor-based method, the K-means algorithm and genetic learning algorithm were used to generate adaptive anchor sizes to fit the FFC dataset. Table 3 summarizes the quantitative comparison results of the ten methods on the FFC dataset. The results indicate that the proposed YOLOOD has achieved close to the best AP@0.5:0.95 of 90.82% with minimal time cost. Compared with the regression-based method (i.e., ReDet), the proposed YOLOOD achieved a 2.96% better AP@0.5:0.95. The addition of CSL angle classification algorithm and head point regression avoids the problem of angle periodicity and significantly improves the detection accuracy of FFC. Although the AP@0.5:0.95 of the proposed YOLOOD is 0.1% less than ROI-Trans, the inference time of our method is only about 3/10 of ROI-Trans. The proposed YOLOOD adopts a single-stage detection network and angular classification method, which avoids excessive feature extraction and angle regression in ROI-Trans. Therefore, YOLOOD can significantly shorten the model inference time with little sacrifice in accuracy and is more suitable for application in the robot assembly scene.  Figure 7 shows the partial visualization results of the proposed YOLOOD on the FFC datasets. Meanwhile, Fig. 7 indicates that the proposed YOLOOD can deal with noise problems such as illumination and background variation. Good test results are also obtained for samples with varying sizes, angles and dense objects.
Results on DOTA In order to ensure the fairness of the experiment, we did not use multi-scale data enhancement, and all experimental results were carried out under single scale DOTA dataset. The experimental results are shown in Table 4. Our proposed method achieves the best performance with 77.20% mAP. Moreover,

3
YOLOOD: an arbitrary-oriented flexible flat cable detection… our model's inference speed is 66.7 FPS, which is significantly better than other methods.
Results on HRSC2016 In order to make a comprehensive comparison of HRSC2016, we report the results of VOC2007 and VOC2012 indicators. The experimental results are shown in Table 5. Our YOLOOD achieved the third-best performance under VOC2007 and VOC2012 metrics with 90.38 and 96.40%.

Performance analysis
To further verify the effectiveness of YOLOOD, we used CAM to visualize the feature maps. CAM can generate heat maps of corresponding classes, indicating which part of the original image features the neural network pays attention to. Figure 8 shows CAM feature visualization examples for multi-task learning. The first row is the visualization result of YOLOOD's feature map on the FFC dataset, the second row is for DOTA dataset, and the third row is selected from the HRSC2016 dataset. Column a is the original image, column b is the confidence feature map, column c is the classification feature map, and column d is the feature map for different angles. The first row of column d is feature visualization for −76 • , the second row is feature visualization for −61 • , and the third row is for −60 • .
According to the heat map in Fig. 8, we can find that for the confidence prediction task, the model focuses on a small part of the features in the center of the object. In contrast, for the classification task, the features concerned by the model are more complete than the confidence prediction task. The last column more clearly shows the accuracy of YOLOOD.  left corner of the image are ignored. Similarly, the minor boat feature at the top left of the picture in the third row of column d is ignored. In conclusion, the proposed YOLOOD can accurately focus on different types of features according to different tasks, and the learning of the model is reliable and effective.

Real environment grasping
To verify the effectiveness of the proposed arbitrary-oriented robotic object detection method in the real world, various FFC grasping experiments were conducted. First, the robot grasping system was built, including Franka Panda mechanical arm, Realsense D435-i camera, PC, and pneumatic suction device. To simplify, we use fixed grasp coordinates, which can be indirectly calculated through the oriented object bounding box and angle of FFC. Finally, we transform pixel coordinates into real-world coordinates through Franka Panda, Realsense D435-I calibration, and robot hand-eye calibration. We set up two types of grasping scenarios: single objects and objects in dense scenes. The environment of each grasp scene is different, and the FFCs are randomly Fig. 8 Visualization experiments on CAM feature maps on different dataset. The first row is the visualization result of YOLOOD's feature map on the FFC dataset, the second row is for DOTA dataset, and the third row is selected from the HRSC2016 dataset. Column a is the original image, column b is the confidence feature map, column c is the classification feature map, and column d is the feature map for different angles. The first row of column d is feature visualization for −76 • , the second row is feature visualization for −61 • , and the third row is for −60 • placed on the platform in each grasping experiment. The experimental goal is to grasp the FFC from random locations to a specified target area. Figure 9 shows the successful process of the robotic online grasping experiment under different scenes, which proves that our arbitrary-oriented detection method has a good prospect in robotic assembly.

Conclusion
Horizontal object detection methods based on deep learning have been widely used in robot assembly tasks. However, when facing objects with a high aspect ratio, such as FFC, these methods cannot accurately acquire the size and angle information of the object and hardly separate the object from the dense background. In this paper, we study the possibility of applying the rotated object detection method to FFC detection in robot assembly. A novel YOLO-based arbitrary-oriented object detection method using angular classification and headpoint regression named YOLOOD was proposed in this paper. Compared with traditional horizontal object detection, we can accurately locate the FFC at any angle, obtain their angle and size information, and distinguish the head and tail of the FFC. To verify the effectiveness of our method, we built an FFC dataset and did extensive experiments on the FFC dataset. The results indicated that the proposed YOLOOD could obtain an encouraging detection performance in terms of accuracy and speed. In conclusion, we introduce the method of oriented object detection into the field of robot assembly and expect to provide some inspiration to the area of robot sensing. Although some significant results were obtained, our study still has some limitations. For instance, the bending of the FFC is ignored. However, in the actual 3C assembly, the bending of the FFC will significantly impact the assembly. Moreover, our method can only detect and locate the FFC in Fig. 9 Practical robotic manipulation application for the proposed YOLOOD (a-d). Single-object scene, (e-h) dense objects scenes. The first picture of each row is the image taken by the hand-eye camera and the detection result of the proposed YOLOOD, followed by the grasping process