Minimally invasive surgery (MIS) involves the use of surgical instruments to access desired surgery targets inside human body through a small incision hole. In this case, MIS offers reduced risks and injuries for the patients. Soft continuum robots have distal dexterity and structural compliance which play incrementally important roles in MIS1. At present, the control of these robots can be inaccurate due to the nonlinear behaviors of the flexible manipulator, including structural deformation, interaction with soft tissues, and collision with other instruments2. Many control methods have been proposed to improve the motion accuracy of continuum robot. They can be categorized into model-based and model-free methods3.
For model-based control methods, piecewise constant curvature assumption is commonly adopted in existing models of continuum robots4. It allows computationally efficient closed-form solutions for many continuum robots in an ideal environment, by ignoring environmental interaction and other nonlinear factors. The Cosserat rod model allows external factors to be taken into consideration, but is computationally expensive for real-time control5. There are also other model-based methods for continuum robot control based on finite element principles6. However, the unknown disturbances on the robot in the soft tissue environment can often be the obstacle for accurate navigation, whereas integrating a flexible force sensor with the robot is hard to implement due to the narrow space and high cost.
On the contrary, no prior knowledge is required for the model-free methods. Machine learning techniques, such as neutral network, extreme learning machine, and Gaussian mixture regression, have been implemented to model the inverse kinematics of soft robots7,8. However, these methods suffer from poor reliability when addressing real-time challenges in an environment that a robot has never encountered in training. Data-driven and empirical methods have been utilized for updating the model during real-time control. For example, a Jaobian matrix controller was developed using an adaptive Kalman filter that improved convergence and tracking precision9. Thuruthel et al. reported a machine learning based approach for closed-loop kinematic control of continuum manipulators in the task space10. However, the control performance still heavily depends on the accuracy and volume of the training data. High precision and real-time control requires unacceptably high cost in computation and data acquisition.
Combining online and offline learning methods has also been proposed. The offline module is first trained using experimental bank data of neutral network or finite element analysis. The parameters are then modified by online training with real-world data using the proportional-integral-derivative (PID) controller11 or local Gaussian process regression12. Though online learning can adapt to environmental interactions, it requires training with a large dataset. The soft continuum robot system training in real world is relatively unrealizable and time-consuming in engineering, which limits the implementation of online training that needs a large dataset.
Deep reinforcement learning (DRL) has been recently investigated for soft continuum manipulator control. Generally, the DRL methods can be divided into value-based and policy-based DRL13. Value-based DRL algorithms, such as Deep Q-value network (DQN)14, have been applied on 3-dimensional (3D) motion of a soft continuum robot15. when implying DRL on robotics control, many researchers choose to pretraining on simulator and then transfer to real world2,15,16. There is reality gap in simulation caused by the discrepancy17. Although many studies have focused on the simulation-to-real world (sim-to-real) transition to narrow the reality gap in various robot control problems18,19,20,21, there still a long way before filling the gap. The mismatch between the simulated and real-world environments raises the demand for sim-to-real transfer of the knowledge acquired in simulation22. The transfer would hardly work especially when it applies on a complex robot with high degrees of freedom, because the gap between the real robot environment and the robot in the simulator becomes remarkable17.
Besides, directly implementing online training with robot with high degrees of freedom is time consuming and require the ability to keep the robot operational with minimal human intervention, which is a significant engineering challenge17. Thus, a strategy to training a real-world robot with low time-consumption and fast convergence speed is essential for DRL application on soft robots. However, learning complex skills requires considerable samples by the robot, which is hard for real world DRL training with robot. Sample inefficiency often brings up the problem of local minima on DRL23 (Fig. 1a). The DRL controller would keep taking the same action until it reaches the local minima, which could hardly be fully avoided or overcome simply by adjusting hypermeter of DRL exploration (Fig. 1b).
In this work, we proposed an N-Space (NS) framework and investigate the feasibility of using NS based DQN, i.e. NSDQN, and NS based Double Deep Q-learning(DDQN)24, i.e. NSDDQN, algorithms for accurate position control of a continuum surgical manipulator. Based on NS framework, we divided the action space to several sub-action space with different DRL model (Fig. 1c) and the target position was reached by utilizing different DRL model of each sub-action space (Fig. 1d). Promising results were shown in experiments under various circumstances that might be encountered by MIS. A home-made rope-driven continuum soft robot with 5-DOF was presented (Figs. 2a-2e), on which the NSDQN and NSDDQN algorithms were proven with significantly increased performance, including high positioning precision, rapid response time and reduced dependence on training dataset.