Lane Line Detection System Based on Improved Yolo V3 Algorithm

： Aiming at the problems of low accuracy and poor real-time performance of Yolo v3 algorithm in lane detection, a lane detection system based on improved Yolo v3 algorithm is proposed. Firstly, according to the characteristics of inconsistent vertical and horizontal distribution density of lane line pictures, the lane line pictures are divided into s * 2S grids; Secondly, the detection scale is adjusted to four detection scales, which is more suitable for small target detection such as lane line; Thirdly, the convolution layer in the original Yolo v3 algorithm is adjusted from 53 layers to 49 layers to simplify the network; Finally, the parameters such as cluster center distance and loss function are improved. The experimental results show that when using the improved detection algorithm for lane line detection, the average detection accuracy map value is 92.03% and the processing speed is 48 fps. Compared with the original Yolo v3 algorithm, it is significantly improved in detection accuracy and real-time performance.


1.Introduction
In recent years, with the rapid development of China's automobile industry and the continuous progress of artificial intelligence technology, by 2020, vehicles with auxiliary driving function and automatic driving function have accounted for more than 50% of the total number of vehicles in the market [1] .The development of intelligent automobile industry can not only effectively increase the safety factor of automobile driving, but also solve the problems of ecological environment damage caused by traffic congestion.
In the field of intelligent vehicles, the automatic detection of lane markings on the road is the technical basis for vehicle assisted driving and an important part of the perception module in driverless vehicles [2] .At present, the main methods of lane detection at home and abroad include: the method of extracting road features by machine vision, the method of establishing road model for detection, and the method of multi-sensor fusion detection [3] .
The method of extracting road features by machine vision mainly uses machine vision technology to classify the gray value features and color features of lane lines. After learning, the lane lines on the road can be detected. Because the gray value, color and other features in the image are often affected by external conditions such as light intensity and shadow, this method is used for lane line detectionThe measurement is easily disturbed by environmental changes.
The method of establishing a road model for detection is to first establish a two-dimensional or three-dimensional image model of the road picture, and then compare the image in the established model with the road in the photo to be detected, so as to detect the lane line. The application scope of this detection method is relatively small, and it is only applicable to the roads with the characteristics of known templates. In addition, the algorithm has less computation during detectionLarge, with problems such as poor real-time performance.
The method of multi-sensor fusion detection is to detect the lane line by means of high-definition camera, UAV aerial photography, GPS, radar and other fusion methods [4] . This method has high cost and is difficult to be applied in the large-scale practical lane line detection system [5] .
In view of the above characteristics, advantages and disadvantages of lane line detection technology at home and abroad, this paper proposes a lane line detection system based on improved Yolo v3. The system is mainly composed of equal data preparation module, learning and training module and lane line detection module. After testing, compared with traditional machine learning algorithms such as r-cnn and r-fcn, this detection method has better robustness and operationTime and other aspects have improved significantly.

Algorithm principle
Yolo is a real-time target detection algorithm proposed by Redmon in 2016. The detection speed of this algorithm can reach 45 frames per second, and its map value is more than twice that of other real-time detection algorithms. This algorithm is widely used in real-time object detection in cuttingedge technologies such as auto driving [6] .
Yolo v1 treats detection as a regression problem. First, adjust the size of the input image to a unified 448 * 448 pixel, then convolute the image, and finally detect the target by the full connection layer.
The algorithm divides the input image into s * s grids. If the center point of the detected target falls into a grid, the grid is responsible for detecting the target. The parameters to be predicted in each grid include: the predicted value of B boundary boxes, the confidence conf of each boundary box, and the probability pr (class I | object) of C condition categoriesFormula 1 shows the number of parameters num to be predicted for each grid [10] .
Num=s*s*(B*(4+1)+C) (1) Each bounding box includes four predicted values: X, y, W and h. where (x, y) respectively represents the coordinate value of the center point of the bounding box, and W and H are the width and height values of the bounding box. The position and size of the bounding box can be determined through the four values of X, y, W and h. the confidence conf of each bounding box refers to whether the target is included in the bounding box and the accuracy of its prediction.
The definition of confidence conf is shown in formula 2.
Conf=Pr(Object)×IoU(Pred,Truth) (2) In formula (2), IoU (PRED, Truth) refers to the matching degree between the prediction box and the real boundary box. Its definition is shown in Formula 3, where area (Boxt⌒Boxp) represents the area where the prediction frame intersects with the real frame, area (boxt∪Boxp) represents the area of the union of the prediction frame and the real frame. When the prediction frame completely coincides with the real frame, IOU (PRED, truth) = 1; when the two completely do not coincide, IOU (PRED, truth) = 0.
Pr (object) refers to the probability that the bounding box contains the target. When the target is included, PR (object) = 1; when the target is not included, PR (object) = 0.
The probability pr (class I | object) of condition category is based on the condition that the grid contains detection targets, and each grid only predicts the probability of a specific category of targets. Its definition is shown in formula 4 [12] .
Pr(class i |object)=Pr(Class i)/Pr(object) (4) In formula 4, Pr (class I) is the probability that the detection target is a specific class target, and PR (object) is the probability that the detection target is included in the bounding box.
Loyo V2 is a faster target real-time inspection algorithm launched at the end of 2016 based on Yolo v1. After testing, the map value of loyo V2 is 76.8% at the processing speed of 67 frames per second and 78.6% at the processing speed of 40 frames per second [13] .
Compared with Yoyo V1, the main change of YOYO V2 is to reduce the image input into the network. The original 448 * 448 pixel input image is adjusted to 416 * 416 pixels. In addition, the basic ) ox ( network layer is also adjusted. Yoyo V2 uses darknet-19 as the classification network, and the darknet-19 network is composed of 19 volume layers and 5 maximum pool layers. After testing, it is compared with the Darknet network adopted by yoyo v1In addition, the target classification accuracy map of darknet-19 is improved by about 3.2% [15] . Yolo V3 (you only look once V3) is a single-stage target detection algorithm proposed in early 2018. It not only maintains the operation speed of the algorithm, but also improves the prediction accuracy. It is a popular real-time target detection algorithm at present. The network structure model of Yolo V3 is shown in Figure 1 [18] Figure 1. Network structure model of Yolo V3 Compared with the previous version of Yolo, the main differences of Yolo V3 include: using darknet-53 (106 layers in total, including 53 convolution layers) basic network model; using independent logical classifier logistic instead of the previous softmax function; using a method similar to FPN for multi-scale feature prediction. The previous version of Yolo V2 can only use 1 on a single feature map×1. The most prominent feature of Yolo V3 is that it can detect targets in three sizes. When the feature size is 13 * 13, it is used to detect larger targets, when the feature size is 26 * 26, it is used to detect medium-sized targets, and when the feature size is 52 * 52, it is used to detect smaller targets. Therefore, it solves the function of adjusting the detection network according to the size of the target [19] .
The detection steps of the three feature maps with different sizes in Yolo V3 are 32, 16 and 8 respectively. For the pictures with 416 * 416 pixels input, the first detection layer is located in the 82nd layer of darknet-53 network. Because its detection step is 32, the feature map of this layer is a picture with a resolution of 13 * 13. The second detection layer is located in the 94th layer of darknet-53 network. Because its detection step is 16, the characteristics of this layer are differentThe characteristic image is a picture with a resolution of 26 * 26. The third detection layer is located at layer 106 of darknet-53 network. Because its detection step is 8, the characteristic image of this layer is a picture with a resolution of 52 * 52 [22] .

Algorithm related parameters
Loss function is an important parameter in Yolo V3 algorithm, which is used to indicate the inconsistency between the predicted value and the real value in the model. The training model in the algorithm aims to reduce the value of loss function. The smaller its value, the higher the robustness of the model. Commonly used loss functions include mean square difference loss function, cross entropy loss function, etc.
The definition of loss function L in Yolo V3 algorithm is shown in formula (5), which is composed of three parts: bounding box coordinate prediction error L coord , bounding box confidence error L conf and target classification prediction error L class .
L=L coord +L conf +L class (5) The bounding box coordinate prediction error l coord is used to represent the error between the predicted value and the real value of the bounding box position. The definition is shown in formula 6. λ indicates the weight coefficient of coordinate error, S 2 indicates that there are S 2 grids in the picture, and B indicates the number of prediction bounding boxes of each grid. It indicates the possibility that the j prediction box of the I grid contains targets. If targets are included, I obj ij =1 otherwise I obj ij =0. (x i ,y i , w i , h i ).The four values represent the abscissa, ordinate, width and height values of the center point of the real bounding box in the ith grid. The four values represent the abscissa, ordinate, width and height values of the center point of the predicted bounding box in the ith grid.coord [25] (6) The definition of confidence error l conf is shown in formula 7.λ Noobj represents the weight penalty coefficient, and respectively represent the real and predicted confidence. The meaning of other parameters in the formula is the same as that in formula (6).

(7)
The definition of target classification prediction error lclass is shown in formula 8. There are C types of targets to be detected, taking C = 1,2,3... C. and respectively represent the real and prediction probabilities of targets of type C in the ith grid, and the meanings of other parameters are the same as those in formula 6 [26] .

Improved Yolo V3 algorithm 3.1 Improvement of network model
Firstly, in the conventional Yolo V3 algorithm, the network model is darknet-53, and the input image is divided into s * s grids. When using Yolo V3 algorithm for lane line detection, the shape feature of lane line image is that the transverse length is short and the longitudinal length is long. Therefore, in order to increase the grid detection density of image in the longitudinal direction, the algorithm is improved to divide the image into s * 2S gridsThis network structure is more suitable for lane line detection system.
Secondly, aiming at the problem that the size of lane line targets is different under different road conditions, resulting in the poor detection effect of the algorithm, based on the three detection scales of yolov3 algorithm and combined with the idea of FPN algorithm, the detection scale is adjusted to four detection scales: 13 * 13, 26 * 26, 52 * 52104 * 104. After improvement, each detection scale not only obtains the details of the bottom layer of the network, but also can be used aloneThe improved detection scale is more suitable for lane line detection system than the original algorithm.
Finally, yolov3 algorithm uses darknet-53 network as the main framework. When applied to lane line detection, the problem is that the network has a large receptive field, but the spatial resolution is insufficient, which makes it easy to miss target detection when the lane line is not clear. To solve this problem, the convolution layer in yoov3 algorithm is adjusted from 53 layers to 49 layers (named darknet-49), the network structure is simplified and the loss of low-dimensional information is reduced. The network structure upsamples the input image four times to obtain four scale feature maps, which improves the detection performance. The improved network structure is shown in Figure 2.

Improvement of algorithm parameters 3.2.1 Cluster center distance parameter
Anchor parameter refers to the prediction frame with a fixed size. In the process of target detection, the size of the prediction frame directly affects the accuracy and speed of the algorithm, which is a very important parameter. In the original yolov3 algorithm, K-means clustering algorithm is used to obtain anchor parameters. In this method, different feature attributes in the distance center distance formula are treated with equal weight, and the influence of different feature attributes is not consideredIn addition, if there are some noise points or isolated points in the sample, these points will have a great impact on the calculation results and cause serious deviation.
In the lane line detection system, in order to solve the problem of determining the above clustering center distance parameters, a method is proposed to replace the Euclidean distance in the original algorithm with the intersection and union ratio of the sample frame and the prediction frame. Its definition is shown in formula 9, where box is the prediction frame of the sample, CEN is the clustering center, and IOU is the area of the union of the prediction frame and the real frame (as formula 3).

(9)
The actual detection effect comparison between the improved method and the original yolov3 algorithm is shown in Table 1. Its average accuracy and average speed are higher than that of the original yolov3 algorithm. It shows that the improved cluster center distance is more suitable for the detection of small targets such as road routes.  Table 1. Comparison of detection results of cluster center parameters by different methods

Loss function
The loss function is used to estimate the inconsistency between the predicted value and the real value in the model, which is the basis for the evaluation of the false detection rate by the neural network and reflects the convergence performance of the network model. The loss function in the yoov3 algorithm consists of three parts, as shown in formula 5, in which the boundary box coordinate prediction error adopts the form of mean square deviation, while the boundary box confidence error and the target classification prediction error cross Entropy loss function form.
When calculating the loss function in yolov3 algorithm, the boundary box coordinate prediction error, boundary box confidence error and target classification prediction error are activated by logistic function. The function expression is shown in formula 10 and the derivative formula of the function is shown in formula 11. The characteristic of logistic function is that it is single and continuous, and the function value is limited, so the data will not diverge during network transmission; the characteristic of the derivative of this function is that its value is less than 1. When the input value of the network is large, the derivative value will be very small, so the error of the calculated loss function will become larger, and the convergence performance of the network model will become worse.
In order to solve the above problems, the focal loss function is proposed to replace the logistic function in the original algorithm. The expression of the focal loss function is shown in formula 12. Where p is the output value of the logistic function, (1-p)γIs the adjustment factor of the network system,αWeight system for target category (0)<=α<=1)，γIs the focusing coefficient(γ>=0), y is the predicted probability value of the tag.
x e x f   1 1 ) ( The boundary frame coordinate prediction error, boundary frame confidence error and target classification prediction error of the improved loss function are shown in formulas 13, 14 and 15 respectively. After adopting the improved adjustment factor, the reception range of small targets such as lane line is effectively increased and the detection accuracy is improved.

Lane detection system design
Based on the improved yolov3 algorithm network structure, the lane detection system is mainly composed of equal data preparation module, learning and training module and lane line detection module. The implementation process is shown in Figure 3.
The data preparation module mainly completes the collection, marking, screening and preprocessing of lane line samples. The sample collection method is to install a high-definition camera on the car and take lane line photos on the way. Data marking mainly refers to marking lane lines in the picture, including white solid line, white dotted line, yellow solid line, yellow dotted line and other lane lines. The marking method isThe main work in the screening and preprocessing stage is to select highquality pictures from the marked pictures and preprocess them into the format required by yolo v3 neural network. The system uses nearest neighbor down sampling as the main method of picture preprocessing.
The flow chart of the nearest neighbor down sampling algorithm is shown in Figure 4.

Experimental environment
When the system is tested, in order to improve the computing speed and reduce the running time, the computer configuration is adopted: the CPU is Intel i7 10700f, the graphics card is NVIDIA gtx1060, the memory size is 16GB, the operating system is Ubuntu, and the interactive language is python.

Experimental data set
In target detection, the selection of experimental data set and feature annotation are very important, and its accuracy directly affects the effect and speed of training. During the test, a total of 500 photos containing road routes in cities were selected and grouped, including 400 learning samples and 100 test samples.
After the learning samples are preprocessed by the nearest neighbor down sampling method, the lane lines in the pictures are marked. The marking method is to mark the learning sample pictures manually by using the professional data set marking software labelme, and then save them as JSON format files.

Experimental results
The marked lane line pictures are sent to the improved yolov3 algorithm network for training. In the training stage, the size of the pictures used is 416 * 416 pixels. During the training process, several important index parameters in the algorithm are dynamically recorded.
The change process of the average loss function L value is shown in Figure 5. It can be seen from Figure 5 that at the beginning of training, the loss function value is about 1.8. When the number of training increases, the convergence of the loss function value decreases, and when the number of iterations is about 20000, the loss value is 0.1 left and right to achieve the expected effect.

Performance comparison of different algorithms
The lane line detection system using the above improved yolov3 model has good performance in detection accuracy, map value, detection time, missed detection rate and so on.The performance parameters of the algorithm are compared with those of traditional yolov3, r-cnn and fast r-cnn, as shown in Table 2.It can be seen from table 2 that the performance of the lane line detection system of the improved yolov3 model is better than the other three algorithms in all aspects.

.Conclusion
Aiming at the problem that the traditional algorithm in lane detection can not give consideration to the detection accuracy and detection speed, a detection system based on the improved yolov3 algorithm is proposed in this paper.The main improvements include the following aspects: 1.According to the characteristics of inconsistent vertical and horizontal distribution density of lane line images, it is proposed to divide the images into s * 2S grids to improve the vertical detection density.
2. The detection scale is adjusted to four detection scales: 13 * 13, 26 * 26, 52 * 52104 * 104, which is more suitable for the detection of small targets such as lane lines 3. The convolution layer in the original yoov3 algorithm is adjusted from 53 layers to 49 layers, which simplifies the network and improves the system performance.
4.The parameters such as cluster center distance and loss function are improved, which are more relevant and more suitable for lane line detection environment.
The actual test shows that the improved algorithm has good detection performance when detecting flat roads, but when the roads have large slopes, the detection is easy to be affected. Therefore, solving the problem of lane line detection in large slope scenes will be the focus of the next research.