Abstract:Accurate human pose estimation is essential for effective Human-Robot Interaction (HRI). By observing a user's arm movements, robots can respond appropriately, whether it's providing assistance or avoiding collisions. While visual perception offers potential for human pose estimation, it can be hindered by factors like poor lighting or occlusions. Additionally, wearable inertial sensors, though useful, require frequent calibration as they do not provide absolute position information. Force-myography (FMG) is an alternative approach where muscle perturbations are externally measured. It has been used to observe finger movements, but its application to full arm state estimation is unexplored. In this letter, we investigate the use of a wearable FMG device that can observe the state of the human arm for real-time applications of HRI. We propose a Transformer-based model to map FMG measurements from the shoulder of the user to the physical pose of the arm. The model is also shown to be transferable to other users with limited decline in accuracy. Through real-world experiments with a robotic arm, we demonstrate collision avoidance without relying on visual perception.
Abstract:Modern manipulators are acclaimed for their precision but often struggle to operate in confined spaces. This limitation has driven the development of hyper-redundant and continuum robots. While these present unique advantages, they face challenges in, for instance, weight, mechanical complexity, modeling and costs. The Minimally Actuated Serial Robot (MASR) has been proposed as a light-weight, low-cost and simpler alternative where passive joints are actuated with a Mobile Actuator (MA) moving along the arm. Yet, Inverse Kinematics (IK) and a general motion planning algorithm for the MASR have not be addressed. In this letter, we propose the MASR-RRT* motion planning algorithm specifically developed for the unique kinematics of MASR. The main component of the algorithm is a data-based model for solving the IK problem while considering minimal traverse of the MA. The model is trained solely using the forward kinematics of the MASR and does not require real data. With the model as a local-connection mechanism, MASR-RRT* minimizes a cost function expressing the action time. In a comprehensive analysis, we show that MASR-RRT* is superior in performance to the straight-forward implementation of the standard RRT*. Experiments on a real robot in different environments with obstacles validate the proposed algorithm.
Abstract:This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances. By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods, enabling robots to understand gestures from long distances. With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments. Experimental validation demonstrates significant advancements in gesture recognition accuracy, particularly in prolonged gesture sequences.
Abstract:Dynamic gestures enable the transfer of directive information to a robot. Moreover, the ability of a robot to recognize them from a long distance makes communication more effective and practical. However, current state-of-the-art models for dynamic gestures exhibit limitations in recognition distance, typically achieving effective performance only within a few meters. In this work, we propose a model for recognizing dynamic gestures from a long distance of up to 20 meters. The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames. SFT demonstrates superior performance over existing models.
Abstract:Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25~m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.
Abstract:Tendon-based underactuated hands are intended to be simple, compliant and affordable. Often, they are 3D printed and do not include tactile sensors. Hence, performing in-hand object recognition with direct touch sensing is not feasible. Adding tactile sensors can complicate the hardware and introduce extra costs to the robotic hand. Also, the common approach of visual perception may not be available due to occlusions. In this paper, we explore whether kinesthetic haptics can provide in-direct information regarding the geometry of a grasped object during in-hand manipulation with an underactuated hand. By solely sensing actuator positions and torques over a period of time during motion, we show that a classifier can recognize an object from a set of trained ones with a high success rate of almost 95%. In addition, the implementation of a real-time majority vote during manipulation further improves recognition. Additionally, a trained classifier is also shown to be successful in distinguishing between shape categories rather than just specific objects.
Abstract:Robotic arms are highly common in various automation processes such as manufacturing lines. However, these highly capable robots are usually degraded to simple repetitive tasks such as pick-and-place. On the other hand, designing an optimal robot for one specific task consumes large resources of engineering time and costs. In this paper, we propose a novel concept for optimizing the fitness of a robotic arm to perform a specific task based on human demonstration. Fitness of a robot arm is a measure of its ability to follow recorded human arm and hand paths. The optimization is conducted using a modified variant of the Particle Swarm Optimization for the robot design problem. In the proposed approach, we generate an optimal robot design along with the required path to complete the task. The approach could reduce the time-to-market of robotic arms and enable the standardization of modular robotic parts. Novice users could easily apply a minimal robot arm to various tasks. Two test cases of common manufacturing tasks are presented yielding optimal designs and reduced computational effort by up to 92%.
Abstract:Robotic hands are an important tool for replacing humans in handling toxic or radioactive materials. However, these are usually highly expensive, and in many cases, once they are contaminated, they cannot be re-used. Some solutions cope with this challenge by 3D printing parts of a tendon-based hand. However, fabrication requires additional assembly steps. Therefore, a novice user may have difficulties fabricating a hand upon contamination of the previous one. We propose the Print-N-Grip (PNG) hand which is a tendon-based underactuated mechanism able to adapt to the shape of objects. The hand is fabricated through one-shot 3D printing with no additional engineering effort, and can accommodate a number of fingers as desired by the practitioner. Due to its low cost, the PNG hand can easily be detached from a universal base for disposing upon contamination, and replaced by a newly printed one. In addition, the PNG hand is scalable such that one can effortlessly resize the computerized model and print. We present the design of the PNG hand along with experiments to show the capabilities and high durability of the hand.
Abstract:Human dexterity is an invaluable capability for precise manipulation of objects in complex tasks. The capability of robots to similarly grasp and perform in-hand manipulation of objects is critical for their use in the ever changing human environment, and for their ability to replace manpower. In recent decades, significant effort has been put in order to enable in-hand manipulation capabilities to robotic systems. Initial robotic manipulators followed carefully programmed paths, while later attempts provided a solution based on analytical modeling of motion and contact. However, these have failed to provide practical solutions due to inability to cope with complex environments and uncertainties. Therefore, the effort has shifted to learning-based approaches where data is collected from the real world or through a simulation, during repeated attempts to complete various tasks. The vast majority of learning approaches focused on learning data-based models that describe the system to some extent or Reinforcement Learning (RL). RL, in particular, has seen growing interest due to the remarkable ability to generate solutions to problems with minimal human guidance. In this survey paper, we track the developments of learning approaches for in-hand manipulations and, explore the challenges and opportunities. This survey is designed both as an introduction for novices in the field with a glossary of terms as well as a guide of novel advances for advanced practitioners.
Abstract:Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.