To quantitatively investigate the working performance of the visual odometry method, we conducted comparison experiments on the evaluated robot displacement and orientation with those measured with an external 3D laser scanner, the FARO Focus 3D × 130. The 3D scanner can perform high-accuracy 3D reconstruction with a high dynamic range camera, achieving a reconstruction accuracy within 2 mm with the correct calibration method [17]. We considered that this 3D scanner could provide reliable and accurate measurements of the current robot pose and serve as the reference object in the comparison. In particular, in the specific motion, we used the modified visual odometry method to perform localization, and recorded the evaluated robot displacement and orientation. We installed four custom-made spheres on each corner of the robot for ease of recognition in the reconstructed results. Thus, we used the position of the spheres to calculate the position and orientation of the robot, and compared the results with the evaluated results from visual odometry.
As for the evaluation criterion, we utilized the localizing error rate applied by PMORPH2 in previous investigations: error within 100 mm with 1500 mm displacement [3]. Because 100 mm is the width of a single grating lattice, 1500 mm is the regular distance between multiple investigating points.
6.1 Basic experiments in simplified environments
In the beginning phase, we used a simplified environment by splicing several grating blocks to form the grating floor. We allowed the robot to perform various types of motion, including translation, rotation, and a combination of them. Figure 10 shows one of the generated maps with the camera trajectory represented by purple lines. Table 2 shows the comparison between the evaluated displacement and the measured displacement.
Table 2. Comparing results of visual odometry and 3D scanner under a simplified short-range route
Result type
|
x-direction displacement (mm)
|
y-direction displacement (mm)
|
orientation (degree)
|
Visual Odometry
|
738.0
|
1247.9
|
95.5
|
3D scanner measuring
|
728.3
|
1256.0
|
97.7
|
The comparison results showed that the y-direction error rate was 9.7 mm/1500 mm, and the x-direction error rate was 20 mm/1500 mm. We considered that the accuracy met the requirement in such a simplified condition. In addition, the grating lattices in the reconstructed map appear quite regular, which made it possible to count the number of lattices to estimate the robot position.
6.2 Simulations experiments at Japan Atomic Energy Agency (JAEA) Naraha Center for Remote Control Technology Development
To further investigate the performance of the visual odometry method, we utilized the experimental instrument water tank at Naraha Center for Remote Control Technology Development, provided by JAEA. Figure 11 shows the structure of the three-floored water tank. At its center, a water tank with a 4690 mm diameter was utilized as the simulated basement body. The top floor served as a supporting platform for the investigating robot. Across the water tank, we prepared a wooden bridge with a small grating part, the size of whose lattice was the same as that of the No.1 reactor.
To be specific, we would like to simulate the investigating route as the red line in the right part of Fig.11. Because there is only a small grating part on the wooden bridge, to simulate the grating floor, we took images of the grating lattices and printed them to cover the floor. Therefore, compared with the previous experiment, the textures involved in the predetermined route increased to four types, including the printed grating papers, the ordinary floor with anti-skid texture, the real grating, and a wooden bridge.
Thus, we repeated the simulated routes three times and the comparing results are shown in Table 3. Fig.12 shows one of the generated maps by visual odometry method.
Table 3. Comparison results of visual odometry and 3D scanner under simulated routes
Result type
|
x-direction displacement (mm)
|
y-direction displacement (mm)
|
Orientation (degree)
|
Visual Odometry-1
|
3043
|
3022
|
-117.2
|
3D scanner measuring-1
|
3025
|
3180
|
-126.9
|
|
|
|
|
Visual Odometry-2
|
3134
|
3122
|
-127.2
|
3D scanner measuring-2
|
3146
|
3226
|
-129.1
|
|
|
|
|
Visual Odometry-3
|
3249
|
3130
|
-133.9
|
3D scanner measuring-3
|
3192
|
3041
|
-134.2
|
We calculated the error rate from these datasets: the average error rate was 35 mm/1500 mm, and the maximum error rate was 54 mm/1500 mm, which satisfied the required error rate. In addition, we compared the reconstructed objects such as the cross-shaped metal supporters under the small grating part on the wooden bridge with the realistic ones, and confirmed the correctness of the reconstruction.
6.3 Camera parameters ranges according to required accuracy
Through the experiments in the simulations, we confirmed that under specific experimental conditions, the visual odometry method was successful in providing the required localizing accuracy. However, in on-site investigations, more restrictions could apply depending on the environmental conditions. Like the size limitations to pass through access penetration, and the camera performance limitations in extreme situations. To provide useful information for the further development of the practical model, in the following sections, we focus on the parameters of the camera pose and program setting and use an experimental method to investigate the available ranges to keep the error rate within 100 mm/1500 mm. Table 4 shows the relevant factors, their default values in the previous experiments, and the adjusting ranges. Because we mounted the camera system on a metal frame, the adjustable ranges of the camera pose might be limited.
Table 4. Default values of the concerned parameters
Parameters
|
Default values
|
Adjusting ranges
|
Camera height (mm)
|
400
|
250~600
|
Camera inclined angle (degree)
|
50
|
30~70
|
Frame per second (frames/s)
|
30
|
12~60
|
Minimum number of feature points
|
250
|
50~350
|
Thus, we would like to investigate the changing tendency of the localization accuracy corresponding to each parameter individually within this adjustable range, by keeping the other parameters constant. We utilized the 3D scanner as the reference device and focused on two types of simplified motion modes: translation and rotation. We controlled the robot to perform repetitive motions under different parameter settings and recorded the estimated displacement or orientation by visual odometry.
Thus, to quantize the localizing performance, we denoted the desired displacement as (xd , yd, θd), which were adjusted according to the predetermined motion modes including translation and rotation. Correspondingly, we denoted the estimated displacement as (xet, yet, θet) for the translation motion and as (xer, yer, θer) for the rotation motion. In the predetermined translation mode, the robot moved along the forwarding direction for 1000 mm on the grating floor. Thus, we denoted the error rate as ERt, which is the proportion of the difference between the main displacement index and the estimated one. In the translation mode, the main displacement index should be the y-direction displacement. SEt represents the shift error that occurs in the displacement estimating process of translation, defined by the x-direction displacement as the transverse shift error. Equations (3) and (4) show the concrete definitions of these indexes. In the predetermined translation motion, (xd, yd, θd) should be set as (0,1000,0).
For the predetermined rotation mode, the robot rotated around the center for 45° in a clockwise manner. Thus, we defined the error rate ERr as the proportion of the orientation difference over the desired orientation. We defined the robot center displacement as the shift error, SEr. Equations (5) and (6) show the relationships. In the predetermined rotation motion, (xd, yd, θd) should be set as (0,0,45).
Thus, we would use the error rate within 100 mm/1500 mm in a translation motion, and the same proportion of 3°/45° in a rotation motion as the requirement. By maintaining the error rate within the range, we adjusted the parameter to acquire adjustable ranges. The resulting changing tendencies of both the error rates and the shift errors with the changing parameters are presented in Figs. 13 and 14. The vertical axis indicates the error rates with the line chart, and the horizontal axis shows the changed parameter. We also included the shift error value in the figure with the bottom bar chart as an alternative index of the accuracy.
As the left part of Fig. 13 shows, under changing camera heights, both error rates and shift errors in the translation motion were at approximately the same level. We concluded that the influence of the changing camera heights on the localizing performance was relatively small in translation motion. We attributed this tendency to the transformation of the originally inclined camera view to the bird’s-eye view; the homography matrix was not sensitive to the height itself when there were no rotating components in the instant motion matrix. However, when the robot performs primarily rotating motion, both the error rates and the shift error appear quite significant when the camera heights are less than 300 mm. We supposed that when the camera height was relatively low, the available view range of the grating would correspondingly reduce. Furthermore, the perspective transforming process acquired the bird’s-eye view at the expense of a reduced view range. However, the available view range was closely related to the available number of feature points. When trying to track features of relatively large objects, a sufficient view range would become relevant. Thus, to ensure a sufficient view range after the perspective transformation, the camera height should be set to more than 300 mm.
Under changing camera angles, the variation in the translation motion was similar to that of changing the camera height in the rotation motion. Error rates and shift errors appeared significant when the camera angle was less than 40°, while the rotation results limited the camera angles from 40° to 60°. In the perspective transformation, the homography matrix contained mappings in both the vertical and horizontal directions. The mapping component in the horizontal direction was similar to the radius around the original focus in the original view in the rotating motion, which would influence the localization accuracy significantly. Thus, we considered that a suitable range of camera angles should be set to limit the mapping component in the horizontal direction. Because we aimed to transform the original view to the bird’s-eye view, the homography matrix may not be robust or accurate when the original view approaches the horizontal plane, for example, when the angle is set to over 70°. However, the results also showed that the horizontal camera configuration of PMORPH may not be directly applied in the visual odometry method.
As Fig. 14 shows, the tendency appeared similar to the error rates, and shift errors remained at a relatively low level with the changed parameter over a specific value. We considered the parameters of the programming setting to be similar because the frame rate directly influenced the available frames when passing the objects, and the number of feature points was relevant to the reliability of instantaneous motion estimation. Thus, in the case of only the grating texture, we set the available range of the frame rate to over 24 fps and the minimum number of feature points to over 150. However, in an environment where there are more types of textures, the requirements on the frame rate and number of feature points may also increase. This is because at the boundaries of different textures, the feature points from different textures might be difficult to match in time without sufficient frames in the feature tracking process. However, although setting the frame rate and feature point number might also increase the computational burden of the program, we did not observe any influence on the current experimental settings.
Table 5. Range of parameters meeting the error rate requirement
Parameters
|
Available ranges
|
Camera height (mm)
|
>300mm
|
Camera inclined angle (degree)
|
40°–60°
|
Frame per second (frames/s)
|
>24 fps
|
Minimum number of feature points
|
>150
|
Reflecting on the overall tendency of both the error rates and the shift errors, we concluded that the shift error would also decrease when the error rate was low. The ranges of the parameters are listed in Table 5.
Therefore, we preliminarily evaluated the available parameter ranges including camera height, camera angle, frame rate and number of feature points with simplified motion modes of translation and rotation by using error rate of 100mm/1500mm as criteria.