Abstract:Learning to navigate in unstructured environments is a challenging task for robots. While reinforcement learning can be effective, it often requires extensive data collection and can pose risk. Learning from expert demonstrations, on the other hand, offers a more efficient approach. However, many existing methods rely on specific robot embodiments, pre-specified target images and require large datasets. We propose the Visual Demonstration-based Embodiment-agnostic Navigation (ViDEN) framework, a novel framework that leverages visual demonstrations to train embodiment-agnostic navigation policies. ViDEN utilizes depth images to reduce input dimensionality and relies on relative target positions, making it more adaptable to diverse environments. By training a diffusion-based policy on task-centric and embodiment-agnostic demonstrations, ViDEN can generate collision-free and adaptive trajectories in real-time. Our experiments on human reaching and tracking demonstrate that ViDEN outperforms existing methods, requiring a small amount of data and achieving superior performance in various indoor and outdoor navigation scenarios. Project website: https://nimicurtis.github.io/ViDEN/.
Abstract:Robotic hands offer advanced manipulation capabilities, while their complexity and cost often limit their real-world applications. In contrast, simple parallel grippers, though affordable, are restricted to basic tasks like pick-and-place. Recently, a vibration-based mechanism was proposed to augment parallel grippers and enable in-hand manipulation capabilities for thin objects. By utilizing the stick-slip phenomenon, a simple controller was able to drive a grasped object to a desired position. However, due to the underactuated nature of the mechanism, direct control of the object's orientation was not possible. In this letter, we address the challenge of manipulating the entire state of the object. Hence, we present the excitation of a cyclic phenomenon where the object's center-of-mass rotates in a constant radius about the grasping point. With this cyclic motion, we propose an algorithm for manipulating the object to desired states. In addition to a full analytical analysis of the cyclic phenomenon, we propose the use of duty cycle modulation in operating the vibration actuator to provide more accurate manipulation. Finite element analysis, experiments and task demonstrations validate the proposed algorithm.
Abstract:Dynamic hand gestures play a crucial role in conveying nonverbal information for Human-Robot Interaction (HRI), eliminating the need for complex interfaces. Current models for dynamic gesture recognition suffer from limitations in effective recognition range, restricting their application to close proximity scenarios. In this letter, we present a novel approach to recognizing dynamic gestures in an ultra-range distance of up to 28 meters, enabling natural, directive communication for guiding robots in both indoor and outdoor environments. Our proposed SlowFast-Transformer (SFT) model effectively integrates the SlowFast architecture with Transformer layers to efficiently process and classify gesture sequences captured at ultra-range distances, overcoming challenges of low resolution and environmental noise. We further introduce a distance-weighted loss function shown to enhance learning and improve model robustness at varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 95.1% on a diverse dataset with challenging ultra-range gestures. This enables robots to react appropriately to human commands from a far distance, providing an essential enhancement in HRI, especially in scenarios requiring seamless and natural interaction.
Abstract:Accurate human pose estimation is essential for effective Human-Robot Interaction (HRI). By observing a user's arm movements, robots can respond appropriately, whether it's providing assistance or avoiding collisions. While visual perception offers potential for human pose estimation, it can be hindered by factors like poor lighting or occlusions. Additionally, wearable inertial sensors, though useful, require frequent calibration as they do not provide absolute position information. Force-myography (FMG) is an alternative approach where muscle perturbations are externally measured. It has been used to observe finger movements, but its application to full arm state estimation is unexplored. In this letter, we investigate the use of a wearable FMG device that can observe the state of the human arm for real-time applications of HRI. We propose a Transformer-based model to map FMG measurements from the shoulder of the user to the physical pose of the arm. The model is also shown to be transferable to other users with limited decline in accuracy. Through real-world experiments with a robotic arm, we demonstrate collision avoidance without relying on visual perception.
Abstract:Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tasks like insertion. To address these challenges, additional sensing modalities are required to enable robust insertion capabilities. This letter explores the essential sensing requirements for successful insertion tasks with compliant hands, focusing on the role of visuotactile perception. We propose a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera. A transformer-based policy, trained through a teacher-student distillation process, is successfully transferred to a real-world robotic system without further training. Our results emphasize the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.
Abstract:Modern manipulators are acclaimed for their precision but often struggle to operate in confined spaces. This limitation has driven the development of hyper-redundant and continuum robots. While these present unique advantages, they face challenges in, for instance, weight, mechanical complexity, modeling and costs. The Minimally Actuated Serial Robot (MASR) has been proposed as a light-weight, low-cost and simpler alternative where passive joints are actuated with a Mobile Actuator (MA) moving along the arm. Yet, Inverse Kinematics (IK) and a general motion planning algorithm for the MASR have not be addressed. In this letter, we propose the MASR-RRT* motion planning algorithm specifically developed for the unique kinematics of MASR. The main component of the algorithm is a data-based model for solving the IK problem while considering minimal traverse of the MA. The model is trained solely using the forward kinematics of the MASR and does not require real data. With the model as a local-connection mechanism, MASR-RRT* minimizes a cost function expressing the action time. In a comprehensive analysis, we show that MASR-RRT* is superior in performance to the straight-forward implementation of the standard RRT*. Experiments on a real robot in different environments with obstacles validate the proposed algorithm.
Abstract:This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances. By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods, enabling robots to understand gestures from long distances. With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments. Experimental validation demonstrates significant advancements in gesture recognition accuracy, particularly in prolonged gesture sequences.
Abstract:Dynamic gestures enable the transfer of directive information to a robot. Moreover, the ability of a robot to recognize them from a long distance makes communication more effective and practical. However, current state-of-the-art models for dynamic gestures exhibit limitations in recognition distance, typically achieving effective performance only within a few meters. In this work, we propose a model for recognizing dynamic gestures from a long distance of up to 20 meters. The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames. SFT demonstrates superior performance over existing models.
Abstract:Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25~m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.
Abstract:Robotic arms are highly common in various automation processes such as manufacturing lines. However, these highly capable robots are usually degraded to simple repetitive tasks such as pick-and-place. On the other hand, designing an optimal robot for one specific task consumes large resources of engineering time and costs. In this paper, we propose a novel concept for optimizing the fitness of a robotic arm to perform a specific task based on human demonstration. Fitness of a robot arm is a measure of its ability to follow recorded human arm and hand paths. The optimization is conducted using a modified variant of the Particle Swarm Optimization for the robot design problem. In the proposed approach, we generate an optimal robot design along with the required path to complete the task. The approach could reduce the time-to-market of robotic arms and enable the standardization of modular robotic parts. Novice users could easily apply a minimal robot arm to various tasks. Two test cases of common manufacturing tasks are presented yielding optimal designs and reduced computational effort by up to 92%.