Study on a Risk Model and Target Detection for Prediction and Avoidance of Unmanned Environmental Hazard

10 Comprehensive research is conducted on the design and control of the unmanned systems for 11 electric vehicles. The environmental risk prediction and avoidance system is divided into the 12 prediction part and the avoidance part. The prediction part is divided into environmental 13 perception, environmental risk assessment, and risk prediction. In the avoidance part, the 14 conservative driving strategy based on speed restriction is adopted according to the results of risk 15 prediction. Additionally, the core function is achieved through the target detection technology 16 based on deep learning algorithm and the data conclusion based on deep learning method. 17 Moreover, the location of bounding box is further optimized to improve the accuracy of SSD 18 target detection method based on solving the problem of unbalanced sample categories. Software 19 such as MATLAB and Carsim are applied in the system. From the comparison results of the 20 simulations of unmanned vehicles with or without a system, it that the system can provide 21 effective safety guarantee for unmanned driving.


25
In the automobile industry, unmanned driving technology has attracted a great deal of attention in 26 recent years. It can fundamentally change the automobile industry and traffic systems. On the 27 other hand, it can also alleviate the problems of accidents, pollution, and congestion of existing 28 vehicles and the traffic 1 . 29 The commercialization of the unmanned driving should take safety as the premise and realize the 30 importance of the safe unmanned driving in the complex driving environment 2-4 , which is the 31 theme of this paper. 32 Anti-collision technology is one of the key points of unmanned research. Many achievements have 33 been made in the development of anti-collision technology, such as sensor information fusion, 34 anti-collision research, and anti-collision warning strategy 4-6 . However, it is still a long distance 35 from being completely practical considering the influence of multiple working conditions. 36 Some scholars have concluded the problems as follows: 37 Limited information fusion. At present, the research on sensor fusion is only the fusion between 38 two or three kinds of sensors, and the information of fusion cannot cover the working conditions 39 overall 4-7 . It is necessary to carry out the information fusion of various sensors and other data 40 sources in order to adapt to the actual driving conditions. 41 Multiple road conditions study is incomplete. No overall consideration is given to factors such as 42 road environment, weather conditions, the influence of personnel in the environment, and the 43 fastest response speed of vehicles 5-8 . 44 The early warning strategy needs improving. The present study basically takes distance as the 45 evaluation index. However, for the actual traffic situation, the process from safety to danger is a 46 gradual change, and multiple evaluation indexes should be used [7][8][9] . 47 To solve the problems above, this paper adopts the idea of dynamic risk assessment based on the 48 historical data of the environment and predicts the risk by priority based on the results of 49 environmental risk assessment 10-12 . The integration of the booming internet big data industry and 50 electronic information engineering technology makes the risk assessment of traffic environment 51 no longer rely on manual rule setting and machine vision recognition, but can use the data from 52 navigation app and traffic department to realize the joint modeling and statistical analysis10. 53 Moreover, it is possible to dynamically assess the risk of the environment based on historical 54 circumstances and reapply the assessment results to the risk prediction of specific objectives in the 55 environment [13][14][15] . Therefore, this train of thought has high practical significance and application 56 value. 57 Target detection is the leading technology of hazard prediction. The current target detection is 58 mainly aimed at pedestrian, traffic sign, or obstacle [14][15][16]  The realization of the environment awareness system mainly includes four steps: Firstly, it can 110 input the positioning data to the system with BDS/GPS satellite positioning, LBS positioning of 111 WIFI and base stations. Secondly, it can also use the electronic compass module to achieve 112 position refinement. Moreover, the environment prejudgment's realizing is based on position and 113 machine vision. Finally, the environmental data is output. Among them, the methods of satellite 114 positioning and electronic compass positioning are quite mature, but how to achieve 115 environmental judgment and corresponding risk assessment on this basis is the key problem to be 116 solved by the environmental awareness system. In this paper, a risk model is established based on 117 location, accident data, and a target detection algorithm through depth learning, which is proposed 118 to realize environmental judgment. 119

Risk Model Based on Location and Accident Data 120
According to the location information provided by the satellite and the electronic compass, it is 121 possible to make judgments on the types of nearby environment. There are six categories of 122 judgments: residential land, industrial land, public facilities land, commercial building land, 123 transportation facilities land, and road land. 124 On this basis, since most driverless vehicles using this system are running, making detailed 125 perception based on machine vision more necessary, which can be divided into two types: 126 intersection and road. Driving environment types are shown in Table 1 Based on the environment-aware data and the type of the nearby environment, the specific name 132 of the nearby environment can be obtained. At the same time, the system can carry out Internet communication and obtain real-time traffic and weather conditions. Allowing for the above 134 information, environmental risk can be judged from the following three aspects. First of all, risk 135 judgment is ultimately to judge risk types and risk objectives. Secondly, they can be summarized 136 as car-car conflict risk, car-person conflict risk, car-object conflict risk, and vehicle control risk. 137 Eventually, risk objectives are visual objectives such as vehicles, pedestrians, bicycles and electric 138 vehicles. Moreover, risk itself is divided into real risk and hidden risk, and real risk is the 139 possibility of collision between risk target and vehicle on site. The hidden risk is difficult to 140 confirm due to various reasons. But there is still the possibility of collision. 141

Location based 142
Risks based on nearby environment types are shown in Table 2   Therefore, this part of the program needs to input the current time. The specific risk type and 160 priority of risk objectives will be determined through database query.

Based on the scene 162
Due to possible errors in location and database, the system confirms and supplements the on-site 163 target detection. First, the targets, such as pedestrians and vehicles, are detected ahead. Compared 164 with the above judgment results, the existing results are marked, or unexpected obstacles appear 165 on the scene. This part is identified through machine vision to add the non-existing results and 166 avoid the omission of risk targets. 167 Because the risk target is divided into real risk and possible risk, the existing risk target, such as 168 vehicle, pedestrian which can be identified by the completed target detection algorithm, and risk 169 weighting under space-time conditions is evaluated based on historical information. 170 Figure 1 and Figure 2 show the accident rate in hours and a week respectively based on the data 171 from Shanghai. The appropriate model is generated with specific data. 172 173 Figure 1. Accident incidence rate in hours. Figure 2. Accident incidence rate in a week.

174
The accident rate varies greatly with time, and vise verse. Weighted value table based on accident 175 ratio is shown in Table 3. With the average value of 1 for each type, the accident data of the whole 176 year are used for statistics, and the weighted value of each time period is re-weighted. 177 Day Time   According to the current location, weather and time matching accidents, the priority of high-risk 187 accident objects is increased. According to the causes of accidents in the current environment, the 188 hidden risks are added and sorted. Therefore, the final program will output the risk target list data 189 with priority and hidden risks. 190

Target detection method based on depth learning 191
Deep learning trained models are applied to identify and detect the sequence of captured images. 192 And the algorithm is used to calculate the direction and speed of the target and the distance to 193 provide data for the next step. 194 The velocity prediction is realized by moving the European distance of the target center point 195 between adjacent Dayton. In short, there is a correspondence between the speed of the real world image. If the target speed in the real world is fast, the speed in the adjacent pictures will be the 197 same, so the speed can be obtained by finding the corresponding relation between the real speed 198 and the video image speed. According to the shooting time of adjacent images, the frame rate, the 199 moving distance of the target center and the moving speed in the images can be calculated. 200 Because speed is affected by distance and time, but time is the same for real world and images, the 201 converted distance is the most critical. What's more, the conversion relation can be obtained by 202 using the real size and the image size. For unmanned video images, the license plate can be 203 selected for objects of general size. With the help of the license plate width and the actual width in 204 the image, the conversion ratio is obtained, thus obtaining the real distance and the real speed, and 205 the relative velocity estimation formula of the target is as follows. 206 Ratio of image to real world: 207 208 Actual speed size: 209 Where C is the real license plate size, C' is the size of the object in the picture, d is the euclidean 211 geometric distance of the target moving in the image determined by the displacement of the center 212 point, and fps is the frame rate. Since the velocity is vector, the velocity direction of the target 213 should be obtained in addition to scalar. Firstly, image sequence groups within a period of time 214 should be screened out. Secondly, the object center of the same target should be locked, and the 215 moving direction of the object center in the sequence group should be determined to obtain the 216 direction of instantaneous velocity. 217 For distance calculation, vision distortion and other issues should be considered first when CMOS 218 sensors are used. The correction matrix and camera internal parameters could be obtained by using 219 Matlab camera calibration toolbox and calibration function of OpenCV library. The details will 220 not be described here due to the length of the paper. 221 The system applies a fixed device to perform a single visual distance algorithm. Through 222 conversions from real-world to camera coordinate, camera to image coordinate conversion and 223 image to frame storage coordinate, the conversion from real world to frame storage coordinate is 224 realized: 225 where, (X,Y,Z) is the real world coordinate system, is the camera coordinate 227 system, ‫‬ ‫‬ is the image coordinate, ྩ ྩ is the unit of dividing pixels by millimeters, the 228 origin of the fixed frame storage coordinate system is 0 0 , set any position as u,v , R is the 3x3 rotation matrix, T is the 3x1 translation matrix, and f is the camera focal length. Simplification 230 can be done again: 231 Finally reduced to: 233 is the frame storage coordinates, is the real world coordinates, is the camera internal 235 parameter matrix， is a camera position matrix. 236 Take the real situation as a profile of the Y-axis, set the as the target, and project on the 237 Y-axis. After deduction, the distance formula can be obtained: 238 Take Q as the distance from the camera to the nearest point below, h as the camera height, H as the 240 camera head height, 0 0 is the midpoint coordinate of the image. Using coordinate system 241 conversion: 242 This time, is the pixel height of the image, is the pixel height coordinate of the target in the 244 image, and 0 and are internal parameters provided for calibration. 245 At the same time, according to its own speed calculation, some risk targets have been or will be on 246 the collision path, and this kind of realistic risk targets are marked as the highest priority. In 247 addition, the priority is arranged in turn according to the speed and distance of the target. 248 System construction 249 The prediction part firstly recognizes and perceives the environment, such as identifying 250 intersections, lanes, parking lots, crosswalks, the vicinity of primary and secondary schools, etc., 251 which is a risk model based on location and accident data. Secondly, the risk is evaluated 252 according to historical data, that is, the risk model is used to give the prediction target and risk 253 based on location and accident data. Finally, the target detection method based on depth learning is 254 intended to detect the target and evaluate the risk index of the target. In a word, the system needs 255 to solve the problems of "what is the current environment", "is there any risk in the environment", 256 "what kind of risk is there", "the degree of danger of various risks" and "how to avoid it". 257 The trajectory of the risk target is predicted and tracked, and the braking distance is taken as the 258 safety range for estimation. For hidden risks, the risk of ground skid caused by weather will increase the braking distance, while the risk of line-of-sight problem assumes that objects with the 260 same speed as the vehicle are in the center of the shielding range, and estimates the safety index. 261 The parameters affecting the hazard value include the vehicle speed, braking performance, wet 262 skid degree of the road surface and the direction of the risk target speed. Therefore, the hazard 263 value should be obtained through comprehensive consideration of these parameters. According to 264 relevant documents, when emergency braking is used to avoid collision, deceleration greater than 265 5m/s 2 can be considered dangerous, 2 to 5m/s 2 is critical danger, and below 2m/s 2 it can be 266 considered safe. However, the road conditions will lead to a decrease in braking performance, 267 which is reflected in the deceleration at the maximum braking effect, referred to as the maximum 268 deceleration. Besides, the braking deceleration for any object should be less than the maximum, 269 especially for objects already in the field of view. It should be considered as appropriate even for 270 predicted objects that do not appear in the field of view. Therefore, the critical dangerous 271 deceleration should also give priority to the environmental ground friction coefficient.

316
Theoretical framework of SSD 317 In the original SSD paper, the following structure is presented. SSD uses the feature pyramid 318 structure for detection, which uses the characteristic smaps of conv4-3, 6-2, 7, 7-2, 8-2, 9-2. At the 319 same time, position regression and softmax classification are performed. Figure 5 demonstrates 320 that the SSDs can use VGG-16 as the base network. The feature extraction layer in the latter part 321 is also predicted. In addition, the detection is performed not only on additional feature maps, but 322 also on the underlying conv4-3 and 7-feature ditutatations to achieve compatibility with small 323 goals. 324 Using VGG16 as the basic model, the SSD transforms the fully-connected layer into 3 × 33 × 3 335 convolution layer CONV6 and 1×11×1 convolution layer CONV7, and pool5 from 2×2 to 3×3. 336 Then the FC8 and dropout layers are replaced by a series of convolution layers, and fine-tuned 337 them using the detection set. The Conv4 layer with a size of 38×38 in VGG16 will serve as the 338 first feature map for detection. But the layer data is too large to be normalized instead. 339 Five feature graphs were extracted from the new layers, namely Conv7、, Conv8_2、, Conv9_2, 340 Conv10_2 and Conv11_2, and the original layer of CONV4 was added, forming a total of six.
Then the results are obtained by convoluting the feature graph: Category Confidence and 346 bounding box position, each using a 3 × 33 × 3 convolution to complete, the essence of SSD is 347 dense sampling. 348

Algorithm training and improvement 349
Training 350

Prior box matching 351
Before the work, the prior frame with the target or part of the target is retrieved, and the matching 352 boundary frame will enter the prediction phase. The first step of prior frame matching is to 353 confirm the largest prior frame for at least one frame to be identified. If it has a corresponding 354 target, it becomes a positive sample, otherwise it will be a negative sample. Secondly, if there is a 355 target matching degree greater than the threshold (generally 0.5) for the remaining negative 356 sample, the sample will become a positive sample. Moreover, targets may have multiple prior 357 frames that are not necessarily perfectly matched, but one prior frame cannot correspond to 358 multiple targets. 359

Loss function 360
The loss function can be understood as the weighted sum of confidence and position error:

Improvement based on Focal Loss 384
The main reason that single-level detection is not as accurate as two-level detection is the 385 imbalance of sample categories. Category imbalance will bring too many negative samples, which 386 account for most of the loss function. Therefore, the focal loss is proposed as a new loss function. 387 The loss function is modified based on standard cross entropy loss in Figure 6. This function can 388 reduce easily classified samples by changing the evaluation method, so that more weight is 389 applied to hard classified samples in the training process [9]. The formula is as follows: 390 And it converts the confidence into the standard deviation of the bounding box prediction. 417 The two probability distributions P and Q of a discrete or continuous random variable whose KL 418 divergence is defined as: 419 Before calculating KL divergence, the bounding box needs to be parameterized.
Similarly, the parameter without * indicates the deviation between the prediction and the anchor 431 boundary frame, and the parameter with * indicates the deviation between the real and the anchor 432 boundary frame. 433 Assuming that the coordinates are independent, a univariate Gaussian function is used for 434 simplicity: 435 Where xe is the estimated boundary box position and standard deviation σ is the estimated 437 uncertainty. When σ→0, the boundary box position accuracy is very high. 438 The real boundary box on the ground can also be expressed by Gaussian distribution, and becomes 439 Dirac delta function when σ→0: 440 Where xg is the real boundary box position on the ground. 442 At this time, we can construct a bounding box regression function with KL loss, and establish a 443 formula to minimize the KL error of p θ (x) and PD (x) on n samples: Where is an adjustable parameter for variable voting. When 뛨ྩ is larger, ‫‬ is larger, 458 that is, the two bounding boxes overlap each other more and do the same for the remaining 459 coordinate values. SSD detects the generated preselected box computing loss through FL loss 460 function classification and border regression. Besides, the border regression of SSD is improved 461 based on KL loss method. Frames with large variance and adjacent border frames containing the 462 selected frames but too small will get low scores when voting. Moreover, the SSD algorithm can 463 effectively avoid the abnormal situation mentioned above by variance voting instead of IoU 464 overlap degree. 465 Model testing and analysis 466 The environment perception is divided into two parts, the micro part is main perception of the 467 scene by machine vision, which is used to confirm and supplement the macrocosm and microcosm 468 perception. 469 First of all, we tested the Roi weighting using live campus photos taken on May 7, 2020. The 470 advantage of this algorithm is that the region of interest can be identified first, and then the further 471 perception can be completed. Therefore, the test of region of interest was performed first, and the 472 effect of attention weighting was significant. 473 Second, the environment perception test was carried out because the region of interest was weighted and the weighted region was described firstly. After testing, the algorithm can complete 475 the perception of the simple traffic scene and recognize the red light of the intersection, the bus 476 and the right-turn sign on the road, and can supplement and confirm the environment perception 477    The focus will be on target detection in the hazard prediction section. First of all, the vehicles test 488 is carried out, using field test maps and dataset pictures. Secondly, dynamic vehicles need to be 489 detected, including their speed, distance and running direction. The vehicle target detection is 490 shown in Figure 7 and Figure 8. The dynamic vehicle direction estimation is shown in Figure 9  491 and Figure 10. The dynamic vehicle distance estimation is shown in Figure 11. The vehicle speed 492 detector is used to detect the speed of dynamic vehicles in Figure 12. 493

510
Choose the route from school to the bus station. The path passes through two campuses, two 511 residential areas, a commercial center and four intersections. The total length of this path is 5.6 512 kilometers, which can meet the needs of system test and simulation. In order to facilitate the 513 simulation, the path of latitude and longitude are sampled. In addition, the path can be divided into 514 two paths, Tianshan Road and Qingnian road, and the results are shown in Table 8 and Table 9  515 respectively. The system test path map is shown in Figure 14.    In order to facilitate the simulation of the system function, the speed constraint on the simulation 528 path is visualized. Considering the unity of safety and efficiency, the time has a great influence on 529 speed constraint. Assuming that the vehicle is traveling at 60km/h, simulation speed constraints 530 are provided for Monday, Sunday at 8:00 and Monday at 8:00 and 23:00. By the way, the system 531 will adjust appropriately according to the risk weighting. 532 The speed constraint was loaded into Carsim for dynamic simulation, and the data at 23:00 on 533 Monday was selected to check the difference between simulations with and without the system. 534  What's more, the system will adjust appropriately according to the risk weighting. The speed 537 constraint was loaded into Carsim for dynamic simulation, and the data at 23:00 on Monday was 538 selected to check the difference between simulations with and without the system. 539

541
The comparison of speed constraints between Monday and Sunday is shown in Figure 15. On the 542 whole, the speed constraints on Monday are stricter than those on Sunday, which is caused by risk 543 weighting based on experience. And on the basis of time weighting, roads and intersections of 544 different levels are weighted simultaneously by the system, and corresponding speed constraints 545 are finally formed. At this time, the speed constraint does not consider the vehicle dynamics or the 546 comfort of driving. In practical application, the speed constraint needs to consider the required 547 acceleration of the current speed of the vehicle to implement the speed constraint.The acceleration 548 needs to be comprehensively considered according to the center of gravity, braking performance, 549 acceleration performance, ground friction coefficient, etc., which are ignored during the 550 simulation. 551

553
The comparison results of speed constraints at different times on the same day are shown in Figure  554 16. The speed constraint at 23:00 has been relaxed and vehicles are allowed to travel beyond the 555 standard speed. In practical application, the unmanned driving system needs to combine the road 556 supervision situation with the traffic situation on site to execute the speed. This speed only outputs 557 speed constraints from the perspective of environmental hazards and does not represent the final 558 execution speed. 559 The comparison of simulated driving speeds of vehicles equipped with the system and 560 human-driven vehicles is shown in Figure 17. Since most of the front part of the simulated route 561 passes schools and intersections while other parts are expressways with few intersections, 562 different driving speeds are simulated on the basis of actual driving. Under the ideal condition of 563 smooth traffic, human driving vehicles will be affected by road grade, traffic control and 564 subjective judgment. Besides, the driving speed of vehicles equipped with this system is similar to 565 that of human beings in trend, and the speed constraint is strictly implemented according to the 566 risk grade. It can be seen that the vehicles can realize the defensive driving of human beings more 567 intelligently and flexibly, relying on accurate scientific and objective data analysis conclusions 568 instead of subjective experience. 569 In order to reflect the efficiency of the system more particularly, the application test of the vehicle 570 with or without the system is carried out through the car accident simulation built in CarSim. 571 The content of the traffic accident simulation is that an oncoming vehicle with a speed exceeding 572 100km/h strays into the lane while avoiding the normally running vehicle and eventually rolls over. 573 The normally running vehicle with a speed of 100km/h completes emergency braking during the 574 avoidance. The accident site is a freeway. 575 In the simulation, the speed of 70km/h is set as the normal driving speed, which is consistent with 576 the actual use of expressway. Figure 18(a) demonstrates that the unsystematic vehicles are running 577 normally at a speed of 70km/h. Within a period of visible sight distance, no vehicles were 578 observed to enter the lane in opposite direction. After the danger was found, the collision was 579 avoided by emergency braking. Figure 18 588 Figure 18(b) shows that the vehicles equipped with the system decelerated before the intersection 589 due to speed constraints, and drove through the intersection at a speed approaching 40km/h. After 590 the other vehicle entered the sight distance, lane change and braking operations were adopted to 591 avoid collision safely and stably. It can be seen that the braking curve of the vehicle with a system 592 is smoother than that of the vehicle without a system. The danger has been successfully avoided 593 with preparation in advance. Therefore, the simulation application proves the effectiveness of the 594 system to some extent. 595

596
Environmental hazard prediction and avoidance technology is the key in the research field of