Osaka University, AIST
Abstract:Cooking tasks remain a challenging problem for robotics due to their complexity. Videos of people cooking are a valuable source of information for such task, but introduces a lot of variability in terms of how to translate this data to a robotic environment. This research aims to streamline this process, focusing on the task plan generation step, by using a Large Language Model (LLM)-based Task and Motion Planning (TAMP) framework to autonomously generate cooking task plans from videos with subtitles, and execute them. Conventional LLM-based task planning methods are not well-suited for interpreting the cooking video data due to uncertainty in the videos, and the risk of hallucination in its output. To address both of these problems, we explore using LLMs in combination with Functional Object-Oriented Networks (FOON), to validate the plan and provide feedback in case of failure. This combination can generate task sequences with manipulation motions that are logically correct and executable by a robot. We compare the execution of the generated plans for 5 cooking recipes from our approach against the plans generated by a few-shot LLM-only approach for a dual-arm robot setup. It could successfully execute 4 of the plans generated by our approach, whereas only 1 of the plans generated by solely using the LLM could be executed.
Abstract:Robotic packaging using wrapping paper poses significant challenges due to the material's complex deformation properties. The packaging process itself involves multiple steps, primarily categorized as folding the paper or creating creases. Small deviations in the robot's arm trajectory or force vector can lead to tearing or wrinkling of the paper, exacerbated by the variability in material properties. This study introduces a novel framework that combines imitation learning and reinforcement learning to enable a robot to perform each step of the packaging process efficiently. The framework allows the robot to follow approximate trajectories of the tool-center point (TCP) based on human demonstrations while optimizing force control parameters to prevent tearing or wrinkling, even with variable wrapping paper materials. The proposed method was validated through ablation studies, which demonstrated successful task completion with a significant reduction in tear and wrinkle rates. Furthermore, the force control strategy proved to be adaptable across different wrapping paper materials and robust against variations in the size of the target object.
Abstract:Simultaneously grasping and transporting multiple objects can significantly enhance robotic work efficiency and has been a key research focus for decades. The primary challenge lies in determining how to push objects, group them, and execute simultaneous grasping for respective groups while considering object distribution and the hardware constraints of the robot. Traditional rule-based methods struggle to flexibly adapt to diverse scenarios. To address this challenge, this paper proposes an imitation learning-based approach. We collect a series of expert demonstrations through teleoperation and train a diffusion policy network, enabling the robot to dynamically generate action sequences for pushing, grouping, and grasping, thereby facilitating efficient multi-object grasping and transportation. We conducted experiments to evaluate the method under different training dataset sizes, varying object quantities, and real-world object scenarios. The results demonstrate that the proposed approach can effectively and adaptively generate multi-object grouping and grasping strategies. With the support of more training data, imitation learning is expected to be an effective approach for solving the multi-object grasping problem.
Abstract:Robotic grasping is facing a variety of real-world uncertainties caused by non-static object states, unknown object properties, and cluttered object arrangements. The difficulty of grasping increases with the presence of more uncertainties, where commonly used learning-based approaches struggle to perform consistently across varying conditions. In this study, we integrate the idea of similarity matching to tackle the challenge of grasping novel objects that are simultaneously in motion and densely cluttered using a single RGBD camera, where multiple uncertainties coexist. We achieve this by shifting visual detection from global to local states and operating grasp planning from static to dynamic scenes. Notably, we introduce optimization methods to enhance planning efficiency for this time-sensitive task. Our proposed system can adapt to various object types, arrangements and movement speeds without the need for extensive training, as demonstrated by real-world experiments.
Abstract:Soft pneumatic fingers are of great research interest. However, their significant potential is limited as most of them can generate only one motion, mostly bending. The conventional design of soft fingers does not allow them to switch to another motion mode. In this paper, we developed a novel multi-modal and single-actuated soft finger where its motion mode is switched by changing the finger's temperature. Our soft finger is capable of switching between three distinctive motion modes: bending, twisting, and extension-in approximately five seconds. We carried out a detailed experimental study of the soft finger and evaluated its repeatability and range of motion. It exhibited repeatability of around one millimeter and a fifty percent larger range of motion than a standard bending actuator. We developed an analytical model for a fiber-reinforced soft actuator for twisting motion. This helped us relate the input pressure to the output twist radius of the twisting motion. This model was validated by experimental verification. Further, a soft robotic gripper with multiple grasp modes was developed using three actuators. This gripper can adapt to and grasp objects of a large range of size, shape, and stiffness. We showcased its grasping capabilities by successfully grasping a small berry, a large roll, and a delicate tofu cube.
Abstract:Manipulating deformable objects in robotic cells is often costly and not widely accessible. However, the use of localized pneumatic gripping systems can enhance accessibility. Current methods that use pneumatic grippers to handle deformable objects struggle with effective lifting. This paper introduces a method for the dexterous lifting of textile deformable objects from one edge, utilizing a previously developed gripper designed for flexible and porous materials. By precisely adjusting the orientation and position of the gripper during the lifting process, we were able to significantly reduce necessary gripping force and minimize object vibration caused by airflow. This method was tested and validated on four materials with varying mass, friction, and flexibility. The proposed approach facilitates the lifting of deformable objects from a conveyor or automated line, even when only one edge is accessible for grasping. Future work will involve integrating a vision system to optimize the manipulation of deformable objects with more complex shapes.
Abstract:This study explores a pick-and-toss (PT) as an alternative to pick-and-place (PP), allowing a robot to extend its range and improve task efficiency. Although PT boosts efficiency in object arrangement, the placement environment critically affects the success of tossing. To achieve accurate and efficient object arrangement, we suggest choosing between PP and PT based on task difficulty estimated from the placement environment. Our method simultaneously learns the tossing motion through self-supervised learning and the task determination policy via brute-force search. Experimental results validate the proposed method through simulations and real-world tests on various rectangular object arrangements.
Abstract:This paper proposes a control method to address the physical Human-Robot Interaction (pHRI) challenge in the context of hierarchical tasks. A common approach to managing hierarchical tasks is Hierarchical Quadratic Programming (HQP), which, however, cannot be directly applied to human interaction due to its allowance of arbitrary velocity direction adjustments. To resolve this limitation, we introduce the concept of directional constraints and develop a direction-constrained optimization algorithm to handle the nonlinearities induced by these constraints. The algorithm solves two sub-problems, minimizing the error and minimizing the deviation angle, in parallel, and combines the results of the two sub-problems to produce a final optimal outcome. The mutual influence between these two sub-problems is analyzed to determine the best parameter for combination. Additionally, the velocity objective in our control framework is computed using a variable admittance controller. Traditional admittance control does not account for constraints. To address this issue, we propose a variable admittance control method to adjust control objectives dynamically. The method helps reduce the deviation between robot velocity and human intention at the constraint boundaries, thereby enhancing interaction efficiency. We evaluate the proposed method in scenarios where a human operator physically interacts with a 7-degree-of-freedom robotic arm. The results highlight the importance of incorporating directional constraints in pHRI for hierarchical tasks. Compared to existing methods, our approach generates smoother robotic trajectories during interaction while avoiding interaction delays at the constraint boundaries.
Abstract:Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
Abstract:This work presents a framework for a robot with a multi-fingered hand to freely utilize daily tools, including functional parts like buttons and triggers. An approach heatmap is generated by selecting a functional finger, indicating optimal palm positions on the object's surface that enable the functional finger to contact the tool's functional part. Once the palm position is identified through the heatmap, achieving the functional grasp becomes a straightforward process where the fingers stably grasp the object with low-dimensional inputs using the eigengrasp. As our approach does not need human demonstrations, it can easily adapt to various sizes and designs, extending its applicability to different objects. In our approach, we use directional manipulability to obtain the approach heatmap. In addition, we add two kinds of energy functions, i.e., palm energy and functional energy functions, to realize the eigengrasp. Using this method, each robotic gripper can autonomously identify its optimal workspace for functional grasping, extending its applicability to non-anthropomorphic robotic hands. We show that several daily tools like spray, drill, and remotes can be efficiently used by not only an anthropomorphic Shadow hand but also a non-anthropomorphic Barrett hand.