Our developed assistance system for vision impairment compensation on forklifts uses an HMD. However, the overall system consists of seven submodules shown with a yellow border in Fig. 1. The external influencing variables which consist of the forklift, the driver and the logistics environment are outlined in black. The blue outline represents the system boundaries. We use a Microsoft HoloLens 2 as the HMD for visualization (module 7). We use PC resources for rendering instead of the HoloLens 2 (module 6).
So-called anchors are usually used for the positionally accurate insertion of virtual elements. Typically, markers are used to anchor an object at a specific position. The Microsoft HoloLens 2 can also anchor objects to walls or other geometries using spatial mapping. However, it is intended to be worn by humans and mainly moved by them. Head movements are tracked to create an immersive overlay. In our system, the HMD is located in a moving vehicle (the forklift). This means the spatial anchors to which the virtual objects are attached (lift mast, columns, etc.) can move in space. At the same time the operator can also move relatively to these objects. Thus, it is necessary to determine which parts of the movement registered by the HoloLens 2 result from the forklift movement and which part from the head movement.
Another unique feature of our use case is that the cameras recording the environment can move to each other. In addition to an initial calibration, tracking the mast movement is also necessary.
3.1 Initial calibration and environmental recording
We calibrate the multi-camera system using the Matlab camera calibration Toolbox (Bouguet 2003). For our approach we use the coordinate systems and transformations shown in Fig. 2.
The key coordinate systems can be described as follows:
FL, FR
|
The frame for the left and right fork camera (integrated in the tips)
|
F
|
The frame of the forklift, which is centered in the axis of the front tires
|
B
|
The frame of the base camera, which is fixed to the vehicle
|
AR
|
The frame of the AR glasses, which is the position during initialization
|
MC, ML, MR
|
The frame of the center, (left and right) mast camera
|
W
|
The world frame, which is fixed and used for the calculation of the forklift movement
|
Table 1 summarizes the transformations in our application. Some transformations are static and must be determined only once at the beginning of module 1. Other transformations have to be determined dynamically during the runtime of the application. This is due to the structural design of the vehicle and the arrangement of the camera system. The forklift coordinate system is the same as the CAD model integrated in the game engine Unity which we use in our approach for rendering.
Table 1
Key extrinsic transformations between coordinate systems
Transformation
|
Description
|
Type
|
Determination
|
FR T FL
|
Fork left to fork right
|
Static
|
Initial calibration
|
B T FR
|
Fork Right to base
|
Dynamic
|
Initial calibration
|
B T MC
|
Mast center to base
|
Dynamic
|
External Sensors
|
F T B
|
Base to forklift
|
Static
|
Measurement 3D model
|
FL T AR
|
AR glasses to forklift
|
Dynamic
|
External sensors and HMD sensors
|
MC T ML
|
Mast left to mast center
|
Static
|
Initial calibration
|
MC T MR
|
Mast right to mast center
|
Static
|
Initial calibration
|
We calibrate the multi-camera system using the RGB data of each camera. The use of the Matlab calibration toolbox involves using a checkerboard pattern. The pattern must be completely captured by the cameras during the calibration process. In addition, the toolbox is limited to two cameras.
To eliminate the problem of a small overlapping FoV in our system, as shown in Fig. 3, we perform a pairwise calibration between adjacent cameras. For example, to represent the left fork camera FL in the forklift coordinate system F by using following transformation:
F T FL = F T B B T FR FR T FL
The pairwise calibration starts with determining the rigid transformation between the left and right fork camera FR T FL. Then, we determine the transformation between the right fork camera and the base camera B T FR. We determine the transformation matrix between the base camera and the vehicle F T B with the help of a CAD model of the forklift.
The calibration process using the Matlab Calibration Toolbox is necessary to get the initial poses for further processing in Unity.
It is also possible to build a complete 3D model of the demonstrator, but it is nearly impossible to transfer the correct pose of the virtual objects into the real world. For example, the minor angular errors generate a significant lateral error in the overlay.
For the environmental recording we use the proposed RGB-D multi-camera system to collect 3D scene data with color information for each pixel. The cameras on the lift mast need to be able to see behind the load. This perspective is supplemented with fork cameras in order to also generate data during storage and retrieval operations in the rack.
3.4 Scene reconstruction, rendering and visualization
The advancement of stereo camera technology allows us to offload the computationally intensive depth estimation to so-called RGB-D cameras allowing us to use more cameras and map a larger FoV. For scene acquisition and scene distance detection we use multiple Intel RealSense D435 to generate a depth map of the environment in real-time transforming the image pixel by pixel into the operator's FoV.
Cameras are integrated into Unity via 3D Objects. To apply the initial calibration, each initial transformation should be combined with the transformation between the RGB camera and the camera object's center frame as shown in Fig. 6.
The transformation should be transferred to the left-hand Unity coordinate system. Then each camera object can be described by its extrinsic parameters T:
$$T=\left[ \begin{array}{cccc}{r}_{11}& {r}_{11}& {r}_{11}& x\\ {r}_{11}& {r}_{11}& {r}_{11}& y\\ {r}_{11}& {r}_{11}& {r}_{11}& z\\ 0& 0& 0& 1\end{array} \right]$$
The pose of the camera objects is manipulated by the data from external sensors or with preprocessed data from a transfer function describing the kinematics of the lift mast. The use of parent-child relationships in Unity makes it possible to combine cameras that move the same way such as the three cameras on the lift mast or the two fork cameras.
Figure 7 shows the schematic procedure of rendering in our system. The camera information is merged with the data from the initial calibration and the position sensors. This merging is done in a Unity application on an industrial PC on the forklift. The image is rendered here. The connection to the HoloLens 2 is established via holographic remoting (Microsoft Corporation 2022). The HoloLens 2 does not calculate the scene to be displayed itself but receives it from the Unity application on the PC. Therefore, it sends its position information to the PC.
3.5 Head tracking
The HoloLens 2 head tracking uses its environmental cameras as well as an inertial measurement unit (IMU). It is not designed to work in moving vehicles. It results in drift, jumps or even total failure. Walko and Maibach (2021) proposed covering the environmental cameras of the HoloLens 2. It then uses only IMU for head tracking which increases tracking robustness as jumps and total failure no longer occur. Nevertheless, a drift still occurs. So, it must be corrected via external head tracking. The sole use of external head tracking is not practical as it leads to a post rendering image warp that generates a lot of jitter.
We initially used an external head tracking system. Due to the ambient light in our test environment this proved to be impractical. Instead, we placed an ArUco marker on the HoloLens 2 and determined head position and orientation using OpenCV marker recognition (Fig. 8). We placed a Logitech c920 directly in front of the driver to detect the markers. Only if the deviation of the position determined by the HoloLens 2 and the position determined by the external head tracking exceeds 5 cm in x, y or z-direction an adjustment is made.
Due to our system design the HoloLens 2 is always seen as the main Camera in Unity which means that its position in Unity cannot be adjusted via a C# script. Accordingly, we shift the forklift model and do not change the HoloLens 2 position. For this purpose we determine the relative pose of the forklift to the pose of the HoloLens 2 via an inverse transformation from the pose of the ArUco marker.
Our developed assistance system for vision impairment compensation on forklifts uses an HMD. However, the overall system consists of seven submodules shown with a yellow border in Fig.
1. The external influencing variables which consist of the forklift, the driver and the logistics environment are outlined in black. The blue outline represents the system boundaries. We use a Microsoft HoloLens 2 as the HMD for visualization (module 7). We use PC resources for rendering instead of the HoloLens 2 (module 6).
So-called anchors are usually used for the positionally accurate insertion of virtual elements. Typically, markers are used to anchor an object at a specific position. The Microsoft HoloLens 2 can also anchor objects to walls or other geometries using spatial mapping. However, it is intended to be worn by humans and mainly moved by them. Head movements are tracked to create an immersive overlay. In our system, the HMD is located in a moving vehicle (the forklift). This means the spatial anchors to which the virtual objects are attached (lift mast, columns, etc.) can move in space. At the same time the operator can also move relatively to these objects. Thus, it is necessary to determine which parts of the movement registered by the HoloLens 2 result from the forklift movement and which part from the head movement.
Another unique feature of our use case is that the cameras recording the environment can move to each other. In addition to an initial calibration, tracking the mast movement is also necessary.
3.1 Initial calibration and environmental recording
We calibrate the multi-camera system using the Matlab camera calibration Toolbox (Bouguet 2003). For our approach we use the coordinate systems and transformations shown in Fig. 2.
The key coordinate systems can be described as follows:
FL, FR
|
The frame for the left and right fork camera (integrated in the tips)
|
F
|
The frame of the forklift, which is centered in the axis of the front tires
|
B
|
The frame of the base camera, which is fixed to the vehicle
|
AR
|
The frame of the AR glasses, which is the position during initialization
|
MC, ML, MR
|
The frame of the center, (left and right) mast camera
|
W
|
The world frame, which is fixed and used for the calculation of the forklift movement
|
Table 1 summarizes the transformations in our application. Some transformations are static and must be determined only once at the beginning of module 1. Other transformations have to be determined dynamically during the runtime of the application. This is due to the structural design of the vehicle and the arrangement of the camera system. The forklift coordinate system is the same as the CAD model integrated in the game engine Unity which we use in our approach for rendering.
Table 1
Key extrinsic transformations between coordinate systems
Transformation
|
Description
|
Type
|
Determination
|
FR T FL
|
Fork left to fork right
|
Static
|
Initial calibration
|
B T FR
|
Fork Right to base
|
Dynamic
|
Initial calibration
|
B T MC
|
Mast center to base
|
Dynamic
|
External Sensors
|
F T B
|
Base to forklift
|
Static
|
Measurement 3D model
|
FL T AR
|
AR glasses to forklift
|
Dynamic
|
External sensors and HMD sensors
|
MC T ML
|
Mast left to mast center
|
Static
|
Initial calibration
|
MC T MR
|
Mast right to mast center
|
Static
|
Initial calibration
|
We calibrate the multi-camera system using the RGB data of each camera. The use of the Matlab calibration toolbox involves using a checkerboard pattern. The pattern must be completely captured by the cameras during the calibration process. In addition, the toolbox is limited to two cameras.
To eliminate the problem of a small overlapping FoV in our system, as shown in Fig. 3, we perform a pairwise calibration between adjacent cameras. For example, to represent the left fork camera FL in the forklift coordinate system F by using following transformation:
F T FL = F T B B T FR FR T FL
The pairwise calibration starts with determining the rigid transformation between the left and right fork camera FR T FL. Then, we determine the transformation between the right fork camera and the base camera B T FR. We determine the transformation matrix between the base camera and the vehicle F T B with the help of a CAD model of the forklift.
The calibration process using the Matlab Calibration Toolbox is necessary to get the initial poses for further processing in Unity.
It is also possible to build a complete 3D model of the demonstrator, but it is nearly impossible to transfer the correct pose of the virtual objects into the real world. For example, the minor angular errors generate a significant lateral error in the overlay.
For the environmental recording we use the proposed RGB-D multi-camera system to collect 3D scene data with color information for each pixel. The cameras on the lift mast need to be able to see behind the load. This perspective is supplemented with fork cameras in order to also generate data during storage and retrieval operations in the rack.
3.2 Forklift tracking
We use external sensors to track the forklift's movement as the forklift's diagnostic CAN bus does not provide movement data. Rotary encoders with a fixed-less bearing are attached to the front wheels of the forklift (Fig. 4 left). A measurement box transmits the signal from the encoders to the PC. An asynchronous TCP socket integrated into the API of the measurement box sends the data to Unity for further processing. We use the two-wheel model according to Dudek and Jenkin (2000) to calculate the forklift movement from the rotational speeds of the wheels and move the digital forklift model in Unity (Fig. 4 right) accordingly.
3.3 Lift mast tracking
The lift mast tracking consists of two parts. On the one hand we determine the tilt of the lift mast and on the other hand the lift height. A tracking camera (Intel T265) is attached to the side of the fork carriage (Fig. 5 left). This camera can be integrated directly into Unity via the RealSense wrapper (Dorodnicov 2018) and transmits its pose data. We use the rotational values of the pose data to determine the tilt of the lift mast. Although the pose data contains the vertical movement determined by the tracking camera which would correspond to the lifting height of the forks these values are far too inaccurate. Instead, we use an analog cable sensor so that the sensor is attached to the fixed part of the lift mast and the cable to the moving part (Fig. 5 right). We send the data to Unity via the Ardity wrapper (Wilches 2018) for Arduino. Since a forklift with a duplex frame is used the corresponding kinematic transfer function xi = 2 xo is implemented. Thus, the position of the inner part i can be determined on the basis of the position of the outer part o.
3.4 Scene reconstruction, rendering and visualization
The advancement of stereo camera technology allows us to offload the computationally intensive depth estimation to so-called RGB-D cameras allowing us to use more cameras and map a larger FoV. For scene acquisition and scene distance detection we use multiple Intel RealSense D435 to generate a depth map of the environment in real-time transforming the image pixel by pixel into the operator's FoV.
Cameras are integrated into Unity via 3D Objects. To apply the initial calibration, each initial transformation should be combined with the transformation between the RGB camera and the camera object's center frame as shown in Fig. 6.
The transformation should be transferred to the left-hand Unity coordinate system. Then each camera object can be described by its extrinsic parameters T:
$$T=\left[ \begin{array}{cccc}{r}_{11}& {r}_{11}& {r}_{11}& x\\ {r}_{11}& {r}_{11}& {r}_{11}& y\\ {r}_{11}& {r}_{11}& {r}_{11}& z\\ 0& 0& 0& 1\end{array} \right]$$
The pose of the camera objects is manipulated by the data from external sensors or with preprocessed data from a transfer function describing the kinematics of the lift mast. The use of parent-child relationships in Unity makes it possible to combine cameras that move the same way such as the three cameras on the lift mast or the two fork cameras.
Figure 7 shows the schematic procedure of rendering in our system. The camera information is merged with the data from the initial calibration and the position sensors. This merging is done in a Unity application on an industrial PC on the forklift. The image is rendered here. The connection to the HoloLens 2 is established via holographic remoting (Microsoft Corporation 2022). The HoloLens 2 does not calculate the scene to be displayed itself but receives it from the Unity application on the PC. Therefore, it sends its position information to the PC.
3.5 Head tracking
The HoloLens 2 head tracking uses its environmental cameras as well as an inertial measurement unit (IMU). It is not designed to work in moving vehicles. It results in drift, jumps or even total failure. Walko and Maibach (2021) proposed covering the environmental cameras of the HoloLens 2. It then uses only IMU for head tracking which increases tracking robustness as jumps and total failure no longer occur. Nevertheless, a drift still occurs. So, it must be corrected via external head tracking. The sole use of external head tracking is not practical as it leads to a post rendering image warp that generates a lot of jitter.
We initially used an external head tracking system. Due to the ambient light in our test environment this proved to be impractical. Instead, we placed an ArUco marker on the HoloLens 2 and determined head position and orientation using OpenCV marker recognition (Fig. 8). We placed a Logitech c920 directly in front of the driver to detect the markers. Only if the deviation of the position determined by the HoloLens 2 and the position determined by the external head tracking exceeds 5 cm in x, y or z-direction an adjustment is made.
Due to our system design the HoloLens 2 is always seen as the main Camera in Unity which means that its position in Unity cannot be adjusted via a C# script. Accordingly, we shift the forklift model and do not change the HoloLens 2 position. For this purpose we determine the relative pose of the forklift to the pose of the HoloLens 2 via an inverse transformation from the pose of the ArUco marker.