N-space (NS)-Framework: a generalized solution for soft continuum robots via reinforcement learning

doi:10.21203/rs.3.rs-2005896/v1

Download PDF

Article

N-space (NS)-Framework: a generalized solution for soft continuum robots via reinforcement learning

https://doi.org/10.21203/rs.3.rs-2005896/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 05 Sep, 2023

Read the published version in Communications Engineering →

Version 1

posted

You are reading this latest preprint version

Soft continuum robots undergoes nonlinear spatial deformation. Deep reinforcement learning (DRL) suffers from large training dataset and high time consumption. This paper reports a generalized principle, N-space (NS) framework, that employs on value-based reinforcement learning algorithms (e.g. DQN and DDQN), to overcome the challenge of local minima associated with sample efficiency or limited time consumption in online training. The performance of NS-augmented DRL (NS-DRL) was examined on controlling a self-design rope-driven soft continuum robot with 5-degrees of freedom (DoF). In this framework, the action space of the robot was divided to 6 sub-action spaces. Subsequently, the target position was divided into six sub-target positions, determined by the volumetric vector projection in each sub-action space. The action sequence was determined by actions of vector projection of the target on each sub-action space. NS-DDQN increased the convergence speed by more than 100-fold, from over 100,000 steps to approximately a thousand steps, and reduced the positioning error by over 10-fold, from over 20 mm to less than 1 mm, compared with non-NS enabled DRL positioning. The performance augmentation was also tested in DQN, implying the generalization of NS strategy in controlling soft continuum robots.

Physical sciences/Engineering/Biomedical engineering

Physical sciences/Engineering/Mechanical engineering

N-Space framework

soft continuum robot

robot control

deep reinforcement learning

Minimally invasive surgery (MIS) involves the use of surgical instruments to access desired surgery targets inside human body through a small incision hole. In this case, MIS offers reduced risks and injuries for the patients. Soft continuum robots have distal dexterity and structural compliance which play incrementally important roles in MIS¹. At present, the control of these robots can be inaccurate due to the nonlinear behaviors of the flexible manipulator, including structural deformation, interaction with soft tissues, and collision with other instruments². Many control methods have been proposed to improve the motion accuracy of continuum robot. They can be categorized into model-based and model-free methods³.

For model-based control methods, piecewise constant curvature assumption is commonly adopted in existing models of continuum robots⁴. It allows computationally efficient closed-form solutions for many continuum robots in an ideal environment, by ignoring environmental interaction and other nonlinear factors. The Cosserat rod model allows external factors to be taken into consideration, but is computationally expensive for real-time control⁵. There are also other model-based methods for continuum robot control based on finite element principles⁶. However, the unknown disturbances on the robot in the soft tissue environment can often be the obstacle for accurate navigation, whereas integrating a flexible force sensor with the robot is hard to implement due to the narrow space and high cost.

On the contrary, no prior knowledge is required for the model-free methods. Machine learning techniques, such as neutral network, extreme learning machine, and Gaussian mixture regression, have been implemented to model the inverse kinematics of soft robots^7,8. However, these methods suffer from poor reliability when addressing real-time challenges in an environment that a robot has never encountered in training. Data-driven and empirical methods have been utilized for updating the model during real-time control. For example, a Jaobian matrix controller was developed using an adaptive Kalman filter that improved convergence and tracking precision⁹. Thuruthel et al. reported a machine learning based approach for closed-loop kinematic control of continuum manipulators in the task space¹⁰. However, the control performance still heavily depends on the accuracy and volume of the training data. High precision and real-time control requires unacceptably high cost in computation and data acquisition.

Combining online and offline learning methods has also been proposed. The offline module is first trained using experimental bank data of neutral network or finite element analysis. The parameters are then modified by online training with real-world data using the proportional-integral-derivative (PID) controller¹¹ or local Gaussian process regression¹². Though online learning can adapt to environmental interactions, it requires training with a large dataset. The soft continuum robot system training in real world is relatively unrealizable and time-consuming in engineering, which limits the implementation of online training that needs a large dataset.

Deep reinforcement learning (DRL) has been recently investigated for soft continuum manipulator control. Generally, the DRL methods can be divided into value-based and policy-based DRL¹³. Value-based DRL algorithms, such as Deep Q-value network (DQN)¹⁴, have been applied on 3-dimensional (3D) motion of a soft continuum robot¹⁵. when implying DRL on robotics control, many researchers choose to pretraining on simulator and then transfer to real world^2,15,16. There is reality gap in simulation caused by the discrepancy¹⁷. Although many studies have focused on the simulation-to-real world (sim-to-real) transition to narrow the reality gap in various robot control problems^18,19,20,21, there still a long way before filling the gap. The mismatch between the simulated and real-world environments raises the demand for sim-to-real transfer of the knowledge acquired in simulation²². The transfer would hardly work especially when it applies on a complex robot with high degrees of freedom, because the gap between the real robot environment and the robot in the simulator becomes remarkable¹⁷.

Besides, directly implementing online training with robot with high degrees of freedom is time consuming and require the ability to keep the robot operational with minimal human intervention, which is a significant engineering challenge¹⁷. Thus, a strategy to training a real-world robot with low time-consumption and fast convergence speed is essential for DRL application on soft robots. However, learning complex skills requires considerable samples by the robot, which is hard for real world DRL training with robot. Sample inefficiency often brings up the problem of local minima on DRL²³ (Fig. 1a). The DRL controller would keep taking the same action until it reaches the local minima, which could hardly be fully avoided or overcome simply by adjusting hypermeter of DRL exploration (Fig. 1b).

In this work, we proposed an N-Space (NS) framework and investigate the feasibility of using NS based DQN, i.e. NSDQN, and NS based Double Deep Q-learning(DDQN)²⁴, i.e. NSDDQN, algorithms for accurate position control of a continuum surgical manipulator. Based on NS framework, we divided the action space to several sub-action space with different DRL model (Fig. 1c) and the target position was reached by utilizing different DRL model of each sub-action space (Fig. 1d). Promising results were shown in experiments under various circumstances that might be encountered by MIS. A home-made rope-driven continuum soft robot with 5-DOF was presented (Figs. 2a-2e), on which the NSDQN and NSDDQN algorithms were proven with significantly increased performance, including high positioning precision, rapid response time and reduced dependence on training dataset.

A Continuum Surgical Robot Design

A 5 DoF continuum surgical robot with two segments was developed with a 46 mm-long flexible segment and 7 mm-diameter to allow active distal manipulation of surgical instruments (Figs. 2a-2e). It had a central lumen with an inner diameter of 1 mm, sufficiently large to accommodate one instruments such as a miniature camera, surgical forceps, and suction irrigation tubes(See Supplementary Materials Figs. 1(a)-1(c) and Table 1. Robot parameter). It consisted of a series of 16 disks evenly distributed along the axis with two cylindrical rods connecting every two disks to function as the supporting backbone and the bending joint. Adjacent joints were arranged orthogonally to each other to achieve 2-DoF bending motion. It was 3D-printed using stereolithography of the sterilizable high strength nylon to allow potential integration with imaging modalities, such as magnetic resonance imaging (MRI). It was driven by six motorized lead screws via six pretensioned cables. Each individual DoF was realized by one- or two–motor control. Motors 1, 2 and 5 were responsible for the pitch motion (the x-z plane). Motors 3, 4 and 6 were responsible for the yaw motion (the y-z plane). The linear guide was responsible for the z axis movement.

Table 1

Hyper Parameters
Hyper Parameter	Value
Discount Factor 𝛾	0.95
Replay Buffer Size	10000
Learning Rate 𝛼	0.0001
Batch Size	64
Max Episodes	80
Initial 𝜖	0.4
𝜖 decay rate	0.0001
Target Network Update Frequency C	200

Markov Decision Process (MDP)

Towards developing a robust control strategy that can account for nonlinearity in soft mechanical systems, the following definitions are presented. The simplest MDP can be represented as a tuple

< 𝑆, 𝐴, 𝑃, 𝑅, 𝛾 >, where 𝑆, 𝐴, 𝑃, 𝑅 and 𝛾 represent the state, action, transition model, reward and discount factor¹³. The specific definitions of the task in our control settings were defined as follows:

State (S): States were defined as the distance between real-time position and the target position. (𝜉1, 𝜉2, 𝜉3) = $({x}_{target}-{x}_{tip},{y}_{target}-{y}_{tip},{z}_{target}-{z}_{tip})$, where 𝜉1, 𝜉2 and 𝜉3 were the distance between the robot’s tip position and the target position in pitch DoF, yaw DoF and translation DoF in Z axis.
Action (A): There were totally 10 actions defined. Actions 1–4 were responsible for the X + and X- space, actions 5–8 for the Y + and Y- space and actions 9–10 for the Z + and Z- space. Each action was implemented by a specific displacement of motor cable driving. The setting for actions 1 to 10 was A = {0.16–0.16 0.16–0.16 0.16–0.16 0.16–0.16 0.07–0.07} (mm). At each iteration, one agent selected an action from${a}\in \text{A}$, where a represented the cable displacement in mm. It should be noted that A was a predefined action set.
Reward (R): The reward function provided a scalar numerical signal as feedback to the agents. The reward function for agent in different space was defined as below:

$$\text{d}\text{i}\text{s}\text{t}\left[\text{i}\right]=\sqrt{{({p}_{target}\left[i\right]-{p}_{tip}\left[i\right])}^{2}}$$

$$\text{r}=-\text{d}\text{i}\text{s}\text{t}\left[\text{i}\right]\text{*}{e}^{dist\left[i\right]}$$

where i = 0, 1, 2, 3, 4, 5 represent space 1–6, respectively. The first term on the right hand of Eq. (1) is the Spatial Euclidean distance between the target and actual position of the end tip of a manipulator. The first term in the right hand of Eq. (2) ensured that the reward would be 0 if the manipulator tip reached the target position and the continuity of the reward was also guaranteed. It must be noted that in real-world robot training with DRL, the manipulator might be broken when kept away from the target position. The second term in Eq. (2) was set as an exponential function with the index number being the Spatial Euclidean distance between the target position and actual end-tip position of the manipulator. It gives a huge punishment when the robot was far away from the target position.

Formulation of NS-framework augmented Double Deep Q-learning Network (NSDDQN)

As shown in Fig. 2f, Six spaces were divided in this work, including the X + space, X- space, Y + space, Y- space, Z + space and Z- space. A target position, regarded as a vector with zero, was first decomposed into each space before action implementation. For example, a 3D target position (𝑥, 𝑦, 𝑧) in the whole Cartesian coordinate was decomposed to six sub-target on six divided space as (𝑥−, 0, 0), (𝑥+, 0, 0), (𝑦−, 0, 0), (𝑦+, 0, 0), (𝑧−, 0, 0), (𝑧+, 0, 0), where 𝑥−, 𝑥+, 𝑦−, 𝑦+, 𝑧−, 𝑧+ represented the (𝑥, 𝑦, 𝑧) projection on each axis of the six space. The corresponding models of six spaces were selected as the DRL controller for each sub-target position. For example, the model in X + space was chosen to reach the position in the X axis as the DRL controller. When X + space was selected, the sub-target position would be (𝑥, 0, 0), where x represented the sub-target position. The manipulator would first reach the sub-target position on the X + or X- axis, then reach the sub-target on Y + or Y- axis, and eventually reach the Z + or Z- axis. The final target position was reached by vector synthesis.

Each model on different space was trained with DDQN. Similar with the DQN algorithm, the target value in the DDQN algorithm was calculated through the greedy algorithm. Although, in this manner, the target Q value could quickly approach the optimization target, the target Q value would be overestimated and the deviation would be enlarged in the algorithm model. This problem was solved by using the Double Deep Q-learning Network (DDQN) algorithm. In the DDQN algorithm, overestimation was eliminated by decoupling the action selection and calculation of the target Q value. The key was the full use of the double network structure. In updating parameters of the target Q network through the DDQN algorithm, the critic Q network was firstly used to select the action a_max with an optimal value, as shown in Eq. (3):

𝑎_𝑚𝑎𝑥 (𝑆′, 𝑤) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑄 (𝜙(𝑆′, 𝑎, 𝑤)) (3)

This action was used to calculate the target Q value in target network Qˆ, as shown in Eq. (4):

𝑄_{𝑡𝑎𝑟𝑔𝑒𝑡} (𝜙(𝑆), 𝐴) = 𝑅+𝛾𝑄ˆ (𝜙(𝑆′ ), 𝑎_𝑚𝑎𝑥 (𝑆′, 𝑤), 𝑤−) (4)

For the state transition ((S), A, R, (S0), done), the target value was used to update parameters through the DDQN algorithm, as shown in Eq. (5):

$$\text{Q}\text{t}\text{a}\text{r}\text{g}\text{e}\text{t} \left({\phi }\right(\text{S}), \text{A}) =\left\{\begin{array}{c}R\\ \gamma Qˆ \left(\varphi \right(S{\prime } ), {a}_{max}(S{\prime } , w), w-)\end{array}\right. \left(5\right)$$

Experimental Platform and System Integration

As shown in Fig. 3a, Our continuum surgical manipulator control system consisted of two parts, a motion capture system (NOKOV Inc, CHINA) and the robot. The motion capture system measured the tip position of the continuum manipulator. A closed-loop feed-back control was used with continuous positional data. Each actuation cable was connected to an adapter on a lead screw module to allow high-resolution cable displacement. Fiducial markers were attached to the proximal and distal end of the continuum manipulator, which allowed measurement of the bending angle and the tip position in a 3D space. A data acquisition system collected the sensing data, whilst a control program was implemented in Arduino and Python 3. The communication between the server and client was realized by Transmission Control Protocol/Internet Protocol (TCP/IP). The server received action signals, delivered the corresponding signals to control the motors, and read and sent sensor data to the client computer. The client computer received the sensor data as the input to the DRL algorithms implemented on PARL 2.0.3 and The hyper parameters needed are shown in Table 1. The data was processed and the best action of the current policy was chosen and sent to the server.

“T” pattern trajectory tracking via DQN, DDQN, NSDQN and NSDDQN

In this section, the models after training for 80 episodes, 25 steps per episode, of DQN, DDQN, NSDQN and NSDDQN were evaluated by comparing their performance when tracking “T” patterns. The start point (red) of tracking was fixed and every subsequent step was taken toward a target position based on its current step. The colored dots and curves in Figs. 3b-3e represented the tracked trajectories of stepwise locomotion from the four different controllers, respectively, while the black dots represented the target positions. In Fig. 3b when DQN was the controller, the robot tip positioned with high precision from the start point to the fifth trajectory point, but from the sixth to the ninth point, it kept moving straight upward, instead of turning left or right to comply with the trajectory command. It formed a significantly deformed “T” pattern when referring to the target “T” pattern. The two actual positions reached on the top were not orthogonal to the main axis. It was a similar presentation in Fig. 3c when DDQN was applied as the controller. The Root Mean Squared Error (RMSE) of tracking dislocation was defined as the distance between the target point and the real point the robot reached. The RMSE of the DQN controller in first-to-fifth point segment was 1.3995 mm, but it reached 10.5587 mm in the sixed-to-ninth point segment. The RMSE of the DQN controller in the whole trajectory was 11.6126 mm. The RMSE of the DDQN controller in the first and second segments were 0.6275 mm and 9.6533 mm, respectively. The RMSE of the DDQN controller in the whole trajectory was 6.1246 mm. These results presented significant mis-operation of local minima and the robot arm kept “bending to the same side”. It was caused by the fact that, under small training dataset, both DQN and DDQN controllers tended to take the same action when its decision-making was based on the Q-value network. It worked well when the trajectory was defined as uniform stepwise locomotion along one direction, analogous to “inertia” in physics, but it turned out a failure task in performing high action space tasks.

The limitation of both DQN and DDQN were overcome by using the strategy of NS-framework. Utilizing the characteristic of local minima that the DRL controller kept choosing same actions until reach or far away from the local minimum position when sample inefficiency, NS framework realized fast convergence with the small training dataset instead of high cost in data acquisition, time consumption. Figure 3f shows the significantly improved performance of the NSDQN and NSDDQN controllers, when compared to their non-NS counterparts. The RMSE values for the whole trajectories were 10.5587 mm and 6.1246 mm under the DQN and DDQN controllers, but were reduced to 0.5545 mm and 0.6100 mm under the NSDQN and NSDDQN controllers. For both algorithms, the NS-augmentation received over 10-fold reduction in RMSE.

Comparison of DDQN and NSDDQN Learning Performance

Real world robot training with RL has been a tough problem due to the instability and low durability of soft continuum robots (mechanical wear and tear) during long-term RL training process. NS-framework can largely reduce consumption on the training time. To test the performance of NSDDQN, in this experiment, the model undergoing different episodes of training, approximately from the first to eighteenth episode, was evaluated independently, with reference to the same target position. Each episode consisted of T = 25 iteration steps. The model of different episodes, from 1 to 40, was evaluated. Here, six NSDDQN controllers were selected corresponding to six axial spaces. The controller trained with DDQN was tested under the same translocation condition. During evaluation, the stop condition was set that the robot had moved for 25 steps or reached the goal within an error not exceeding 0.5 mm. As shown in Fig. 3g, all the NSDDQN models converged at 60 episodes, i.e. 1500 steps, whereas the DDQN models failed to convergence even after 800 episodes, i.e. 18334 steps (See Supplementary Materials Fig. 3). Furthermore, under evaluation of “T” patterning, the final distance between the end-tip and the target position were quantified (Fig. 3h). It shows that the tracking distance error under DQN and DDQN controllers distributed in 3.000–9.000 mm, while it was within 0–1.000 mm with NSDQN and NSDDQN controllers.

Point Tracking with DDQN, NSDDQN and Inverse Kinematic (IK) models

To further evaluate the performance of different models for more complicated tasks, the continuum robot was commanded to track two different reference trajectory patterns (sine function and square shape) in a free environment under NSDDQN and DDQN controls. In this experiment, the robot returned to its home position every time it finished the point tracking task. To further evaluate the performance of the proposed NSDDQN controller, a kinematics model-based optimal controller, the inverse kinematics (IK) model²⁵ was introduced(See Supplementary materials Code. 1). As shown in Fig. 4a and 4b, the RMSE values of the IK model were 2.388 mm for the square shape and 2.086 mm for the sine function, respectively. The RMSE values under the DDQN controller was 21.464 mm for the square shape and 22.288 mm for the sine function, respectively. When the NSDDQN controller was employed to perform the task of two trajectories, the RMSE values were reduced to 0.880 mm and 0.707 mm, respectively, about 5-fold decrease from the IK model and nearly 50-fold decrease from the DDQN model. It proved that NSDDQN had dramatically improved performance in tracking complex patterns when compared with its non-NS counterpart models and IK models.

Irregular Trajectory and Point Tracking with NSDDQN

When a soft robot is performing a MIS task, its tip movement is required to have high-precision and robustness in response to the commands for irregular trajectories, either in an open or tissue-enclosed environment that matches the architecture in a body. To test the 3D trajectory tracking performance, the robot was commanded to track different reference shapes on two patterns that cohesively reflected the complexity of 3D movement (Fig. 4c, Circle, and Fig. 4d, 3D spiral shape). The tracking RMSE of the two patterns were 0.866 mm and 0.547 mm, respectively. To test the 3D point tracking performance, the robot was commanded to track different reference points on two patterns that cohesively reflected the complexity of 3D movement (Fig. 4e, Conical helix and Fig. 4f, Circle). The tracking RMSE of the two patterns were 0.866 mm and 0.547 mm, respectively. The robot also was commanded to track reference points on a 3D sine function pattern and the tracking RMSE was 0.652 mm (See Supplementary Materials Fig. 2). The results showed NSDDQN had submillimeter errors in tracking 3D shapes, which implies its significance in real surgery applications²⁶.

Robust NSDDQN Control Under External Payload

To simulate the scenarios where the continuum robot performs clinically surgical operations on biological tissues or carries a load, such as an endoscopic camera or a forcep at its tip, a weight of approximately 10g was hung at the manipulator tip (See Supplementary material Movie 1). It was then commanded to track the trajectory of a 3D-hat shape. To evaluate the performance of the proposed NSDDQN controller, an IK controller was developed during the point tracking and served as the comparison group. In the point tracking experiment, a success was defined as the tip reaching the target position within a threshold of 0.4 mm. Figure 4g shows the tracking pattern and the RMSE under the 10g load. Compared with DDQN, NSDDQN enabled significantly lower error even with large external payloads. The performance of NSDDQN (RSME = 0.630 mm for the 3D-hat shape) was significantly better than the IK controller (RMSE = 4.848 mm for the 3D-hat shape). The inaccuracy of IK controller in capturing the robot motion under a large external payload was caused by the Inverse Kinematics model developed under the constant curvature assumption in capturing the robot motion under a large external payload, which is not applicable in complicated scenarios where curvature becomes irregular and highly variant.

Surgery Simulation by using Soft Obstacle

The continuum robot manipulator was placed between a pair of foams that served as an environmental constraint. It resembled the soft tissue analogs in the gastrointestinal wall or the inner linings of an uterus, aiming to investigate the online learning capability of improving the tracking performance in the presence of soft perturbation. The manipulator was commanded to follow the straight-line trajectory that crossed the peaks and valleys of the foam at x = 10 mm under the control of NSDDQN (Fig. 4h). NSDDQN achieved the performance with an RMSE of 0.487 mm at x = 10 mm, consistent with the RMSE values presented in load-free circumstances. It suggested the robustness of NSDDQN in real-world practice.

In Fig. 4i, a point tracking experiment was implemented under the soft obstacle(See also Supplementary material Movie 2). The target position was x = 10mm, y = 0 mm, z = 0 mm. After reaching each target position, the robot tip returned to its home position. The point was tracked 10 times and the end tip position error was 0.526 mm. The precision level meets the requirement on most MIS surgeries²⁶.

In this paper, a 5-DOF cable driven soft continuum robot was developed and the superiority of controlling the robot using NS framework based DRL compared to normal DRL and traditional Inverse kinematic-based controller was presented. The physical experiments demonstrated that NS framework based RL consistently yielded submillimeter tracking accuracy and high stability under different test environments, including different tracking patterns, and tracking under external payloads and soft obstacles. The navigation accuracy in space by NS-augmentation was improved by over 10-fold when compared with counterpart DRL methods, and over 5-fold against the kinematic-based methods.

The time and computation consumption was dramatically reduced using NS-based DRL methods, when the convergence was rapidly reached after 2000 steps, compared with the circumstance that non-convergence was reached till after 22500 steps of training in non-NS methods. It also eliminates the dependence on virtual simulation environment in real world DRL training of high-DoF soft continuum robot.

Further work will involve testing the algorithm performance in spaces of higher than six levels and developing more stable high DOFs soft continuum robot that can undergo larger training episodes. Notably, developing open loop NS framework based DRL controllers for soft continuum robots that satisfy clinical applications are in urgent demand. Though the soft robot presented in this work proved robust performance under payload or when obstructed by a soft-tissue analog, its capacity under real anatomic conditions remains to be examined. Eventually, clinical-applicable robots are promising tools to revolutionize medical operations, but they are supposed to be either motion-captured by medical radiography, or can directly be applied in medical operations without feedback from motion capture system.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (grant numbers: 61971255 and 82111530212), the Natural Science Foundation of Guangdong Province (grant number: 2021B1515020092), the Shenzhen Science and Technology Innovation Commission (grant numbers: WDZC20200821141349001, RCYX20200714114736146, RCBS20200714114911104, and KCXFZ20200201101050887,), and the Shenzhen Bay Laboratory Fund (grant number: SZBL2020090501014).

Author Contribution

S.M.: conceptualization, supervision, writing original draft, and writing review & editing; D.W.: investigation, methodology, mechatronics and control system development, data analysis, visualization, and writing original draft; J.Z.: methodology, formal analysis,visualization; Y.Z.: methodology and organoid culture; J.M.: organoid culture;

Conflict of interests

The authors declare no conflict of interest

Data Availability Statement

The data that supports the findings of this study are available from the corresponding author upon reasonable request.

Burgner-Kahrs, J., Rucker, D. C. & Choset, H. Continuum Robots for Medical Applications: A Survey. Ieee T Robot 31, 1261–1280, doi:10.1109/Tro.2015.2489500 (2015).
Ji, G. L. et al. Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning. Ieee Robot Autom Let 6, 7461–7468 (2021).
Chikhaoui, M. T., Burgner-Kahrs J. in in Proc. 16th Int. Conf. New Actuators Vol. 1–11 (2018).
Webster, R. J. & Jones, B. A. Design and Kinematic Modeling of Constant Curvature Continuum Robots: A Review. Int J Robot Res 29, 1661–1683 (2010).
Abu Alqumsan, A., Khoo, S. & Norton, M. Robust control of continuum robots using Cosserat rod theory. Mech Mach Theory 131, 48–61 (2019).
Grazioso, S., Di Gironimo, G. & Siciliano, B. A Geometrically Exact Model for Soft Continuum Robots: The Finite Element Deformation Space Formulation. Soft Robot 6, 790–811 (2019).
Xu, W. J., Chen, J., Lau, H. Y. K. & Ren, H. L. Data-driven methods towards learning the highly nonlinear inverse kinematics of tendon-driven surgical manipulators. Int J Med Robot Comp 13 (2017).
Bergeles., C. e. a. Concentric tube robot kinematics using neural networks. Hamlyn Symposium on Medical Robotics 6, 1–2 (2015).
Li, M. H., Kang, R. J., Branson, D. T. & Dai, J. S. Model-Free Control for Continuum Robots Based on an Adaptive Kalman Filter. Ieee-Asme T Mech 23, 286–297 (2018).
Thuruthel, T. G. et al. Learning Closed Loop Kinematic Controllers for Continuum Manipulators in Unstructured Environments. Soft Robot 4, 285–296 (2017).
Wang, Z. W. et al. Hybrid Adaptive Control Strategy for Continuum Surgical Robot Under External Load. Ieee Robot Autom Let 6, 1407–1414 (2021).
Lee, K. H. et al. Nonparametric Online Learning Control for Soft Continuum Robot: An Enabling Technique for Effective Endoscopic Navigation. Soft Robot 4, 324–337 (2017).
Sutton, R. S. a. B., A. G. Reinforcement Learning: An Introduction. (MIT Press, 2018).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Satheeshbabu, S., Uppalapati, N. K., Chowdhary, G. & Krishnan, G. Open Loop Position Control of Soft Continuum Arm Using Deep Reinforcement Learning. Ieee Int Conf Robot, 5133–5139 (2019).
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 7 (2022).
Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40, 698–721 (2021).
Tan, J. et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. Robotics: Science and Systems Xiv (2018).
Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. 2018 Ieee International Conference on Robotics and Automation (Icra), 3803–3810 (2018).
Chebotar, Y. et al. in Ieee Int Conf Robot. 8973–8979 (IEEE).
Arndt, K., Hazara, M., Ghadirzadeh, A. & Kyrki, V. Meta Reinforcement Learning for Sim-to-real Domain Adaptation. 2020 Ieee International Conference on Robotics and Automation (Icra), 2725–2731 (2020).
Zhao, W. S., Queralta, J. P. & Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey. 2020 Ieee Symposium Series on Computational Intelligence (Ssci), 737–744 (2020).
Ding, S. F., Zhao, X. Y., Xu, X. Z., Sun, T. F. & Jia, W. K. An effective asynchronous framework for small scale reinforcement learning problems. Appl Intell 49, 4303–4318 (2019).
van Hasselt, H., Guez, A. & Silver, D. Deep Reinforcement Learning with Double Q-Learning. Aaai Conf Artif Inte, 2094–2100 (2016).
Dong, X., Raffles, M., Cobos-Guzman, S., Axinte, D. & Kell, J. A Novel Continuum Robot Using Twin-Pivot Compliant Joints: Design, Modeling, and Validation. J Mech Robot 8 (2016).
Shi, C. Y. et al. Shape Sensing Techniques for Continuum Robots in Minimally Invasive Surgery: A Survey. Ieee T Bio-Med Eng 64, 1665–1678 (2017).

There is NO Competing Interest.

1supplementaryFigures.docx
mmexport1661664750273.mp4
Supplementary Movie 1
mmexport1661664757324.mp4
Supplementary Movie 2

Download PDF

Journal Publication

published 05 Sep, 2023

Read the published version in Communications Engineering →

Version 1

posted

You are reading this latest preprint version

N-space (NS)-Framework: a generalized solution for soft continuum robots via reinforcement learning

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Robot Design And Control Architecture

A Continuum Surgical Robot Design

Markov Decision Process (MDP)

Formulation of NS-framework augmented Double Deep Q-learning Network (NSDDQN)

Results And Discussion

Experimental Platform and System Integration

Comparison of DDQN and NSDDQN Learning Performance

Point Tracking with DDQN, NSDDQN and Inverse Kinematic (IK) models

Irregular Trajectory and Point Tracking with NSDDQN

Robust NSDDQN Control Under External Payload

Surgery Simulation by using Soft Obstacle

Conclusions

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1