The study of robotic hands is a diverse and interdisciplinary field that encompasses various disciplines, including mechanics, mathematics, control theory, and machine learning. Researchers aim to replicate the dexterity and movement capabilities of the human hand in mechanical structures known as humanoid or robotic hands. An example of such dexterity is the grasping and manipulation of objects. Grasping an object with a robot hand is a challenging task, and its success depends on several factors such as the shape and position of the object. Accordingly, skilled hands have been investigated in different grasping applications. Hardware-based methods require reprogramming when dealing with new objects or slight variations in the grasping process. The advancement of artificial intelligence (AI) has indeed had a significant impact on robotic hands, surpassing traditional hardware-based methods. Deep learning and RL are subfields of AI that have revolutionized the way robots perceive and interact with objects, particularly in the context of grasping. Furthermore, the applications of robotic hands extend to sections such as surgery, prosthetics, the manufacturing industry, and rehabilitation.
The anatomy of the human hand is complex and attractive, enabling a wide range of dexterous movements. The hand comprises several bones that work together to provide a wide range of motion and dexterity. Finger joints can be classified into Metacarpophalangeal (MCP) joints, Proximal Interphalangeal (PIP) joints, and Distal Interphalangeal (DIP) joints [1]. Figure 1 shows the joints of the fingers.
The human hand possesses a total of 27 degrees of freedom. These degrees of freedom enable the hand's considerable ability to perform intricate movements and handle a wide range of tasks. These intricate movements allow humans to grasp objects with precision, adaptability, and finesse. Grasping an object involves securely holding it using your hand or fingers. There are various taxonomies in the field of grasp analysis that are widely used. These classifications, categorize hand movements and grasping patterns based on functional aspects such as the shape and orientation of the objects being grasped, the use of fingers, and the type of grasping applied. For example, one commonly used classification is the Cutkosky classification [3]. Cutkosky's classification is given in Fig. 2.
This classification includes power grasps and precision grasps. The power grasp involves achieving full contact between the hand and the object being grasped. This type of grasp is employed when stability and grasping strength are crucial. Precision grasps are characterized greater by the utilization of the index finger and thumb to achieve dexterity and control. These grasps are employed when tasks require more delicate or precise manipulation. Both power grasp and precision grasp are necessary for human dexterity and the ability to interact with objects in our environment effectively. The Cutkosky classification is a widely recognized reference and framework used for designing robotic hands. The Shadow Hand [4] is a highly dexterous robotic hand with 24 degrees of freedom and developed by Shadow Robot Company. This robot hand mimics the human hand's structure and capabilities, allowing for complex grasping and manipulation tasks. The DLR Hand II [5] is an anthropomorphic robotic hand with five fingers and 20 degrees of freedom and was developed by the German Aerospace Center. DLR Hand II features tactile sensing capabilities and is designed for complex manipulation tasks and human-robot interaction. The use of 3D printing and rapid prototyping techniques has indeed become increasingly popular in the development of robotic hands. These technologies offer several advantages, including cost-effectiveness, customization, and faster iteration cycles. OpenBionics [6] is a company that focuses on creating affordable and customizable robotic hands using 3D printing and is designed for amputees. InMoov [7] is an open-source project that provides a framework for building a functional and humanoid robotic hand. InMoov robot hand is primarily designed to be fabricated using 3D printing technology and is driven by servo motors.
The integration of AI, particularly deep learning, into robot hand grasping research has brought significant advancements. It has enabled researchers to overcome the limitations of hardware-specific approaches, achieve generalization across different systems, and develop adaptive and intelligent grasping capabilities. Deep learning algorithms such as CNNs [8], can be used for analyzing visual data, image classification, and object detection. CNNs apply sequential operations on the input image, typically consisting of input, convolution, Pooling, and fully connected layers. Deep CNNs excel at solving complex problems with high accuracy. These networks are computationally intensive and time-consuming. However, several solutions have been proposed to these problems. Transfer learning is a technique where a pre-trained model, typically trained on a large-scale dataset (e.g., ImageNet), is utilized as a starting point for a new task or a different dataset [9]. This approach reduces the need for training a network from scratch, saving both time and resources. Fine-tuning is a technique used in transfer learning and involves adjusting the parameters of the pre-trained model to adapt it to the specific characteristics and requirements of the new task. Visual Geometry Group (VGG) [10] is one of the common pre-trained CNN architectures. VGG-16 consists of 16 layers and was proposed by the Visual Geometry Group at the University of Oxford. Devaraja et al. [11] trained the robotic hand on multiple objects and evaluated its performance in recognizing object shapes. They employed a support vector machine (SVM) classifier to recognize the shape of objects and reported an average object shape recognition accuracy of 94.4%. Levin et al. [12] focus on a learning-based approach for robotic grasping using monocular images. They used deep learning techniques to develop a grasp success prediction. The experimental results obtained illustrate the effectiveness of their method in grasping a diverse range of objects.
In recent years, to minimize human participation in data capture and improve accuracy in the context of grasping hand robots, the RL method has provided suitable solutions. RL is one other subfield of machine learning that specifically focuses on sequential decisions. In RL, an agent learns to take actions in an environment to maximize a reward signal. The agent receives feedback in the form of rewards or penalties based on its actions, enabling it to learn optimal strategies through trial and error. An MDP is a mathematical model used to formalize the RL problem and consists of five components (a set of states, actions, transition probabilities, rewards, and a discount factor). The key assumption in an MDP is the Markov property, which states that the current state contains all the necessary information to predict the next state. RL algorithms aim to find an optimal policy by estimating the value function, which measures the expected cumulative reward from a specific state or state-action pair [13]. The value function allows the agent to prioritize actions that lead to higher rewards. RL algorithms can be broadly classified into two categories model-based and model-free methods. These classifications are based on how the RL agent learns and utilizes knowledge about the environment. Model-free RL is more flexible and can handle environments with complex dynamics since it does not explicitly rely on a learned model. However, it might require more exploration to discover the optimal policy and can take longer to converge compared to model-based approaches. Model-free RL algorithms often work with state-action value functions. Q-learning is a known model-free algorithm used to solve RL problems [14]. To balance exploration and exploitation, Q-learning often employs an ε-greedy policy. According to this policy, with a high probability of (1 - ε), the agent selects the action that has the highest Q value for a given state. This exploitation step aims to choose actions that are believed to have higher expected rewards based on the learned Q values. On the other hand, with a low probability of ε, the agent selects a random action.
Training RL agents on physical robots requires collecting a significant amount of training data. This process can be time-consuming and impractical, as it often involves repetitive interactions with the robot over extended periods. Also, RL agents typically explore and learn through trial and error, which can be unsafe when applied directly to physical robots [15]. To address these challenges, robotic simulators are employed. Simulators provide virtual environments that mimic the behavior and physics of real-world robotic systems. RL agent by using simulators, can learn and explore different tasks in a faster, more scalable, and cost-effective manner [16]. MuJoCo [17] is a physics engine developed by Emmanuel Todorov. MuJoCo is used in robotics research, particularly in studies related to robotic hands and manipulation tasks. Training simulated robotic arms using RL algorithms to perform tasks like reaching a goal has shown promising results in robotics research [18]. Researchers by applying RL techniques to simulated environments, can train robotic arms to acquire the necessary skills to complete specific tasks effectively. Grasping complicated objects can be a challenging task due to their shape and potential slippage. It demonstrates the effectiveness of the RL algorithm in training the robotic arm to learn a grasp policy that enables it to securely and reliably grasp cylindrical objects. Overall, the successful application of RL to grasp complicated objects demonstrates the potential of this approach in training robotic arms to handle challenging and diverse object manipulation tasks [19].
This study proposed a 5-finger robot hand for grasping a wide range of objects. This robot hand is manufactured using 3D printing technology. The CNN method is used to learn stable grasping and control the automatic opening and closing of the robot's hand fingers. This method focuses on fine-tuning the last layer of a pre-trained neural network structure and eliminates the need for human control and decision during the grasping process. To improve and increase accuracy, the RL method is applied to the simulated robot hand in the MuJoCo environment. The use of simulation reduces the cost and risks associated with training on real robotic hardware. The Comparison of CNN and RL methods to determine which one leads to the best performance in the grasping of a hand robot is indeed a significant contribution to the present paper.