Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takuya Kiyokawa

Osaka University

Cooking Task Planning using LLM and Verified by Graph Network

Mar 27, 2025

Ryunosuke Takebayashi, Vitor Hideyo Isume, Takuya Kiyokawa, Weiwei Wan, Kensuke Harada

Abstract:Cooking tasks remain a challenging problem for robotics due to their complexity. Videos of people cooking are a valuable source of information for such task, but introduces a lot of variability in terms of how to translate this data to a robotic environment. This research aims to streamline this process, focusing on the task plan generation step, by using a Large Language Model (LLM)-based Task and Motion Planning (TAMP) framework to autonomously generate cooking task plans from videos with subtitles, and execute them. Conventional LLM-based task planning methods are not well-suited for interpreting the cooking video data due to uncertainty in the videos, and the risk of hallucination in its output. To address both of these problems, we explore using LLMs in combination with Functional Object-Oriented Networks (FOON), to validate the plan and provide feedback in case of failure. This combination can generate task sequences with manipulation motions that are logically correct and executable by a robot. We compare the execution of the generated plans for 5 cooking recipes from our approach against the plans generated by a few-shot LLM-only approach for a dual-arm robot setup. It could successfully execute 4 of the plans generated by our approach, whereas only 1 of the plans generated by solely using the LLM could be executed.

Via

Access Paper or Ask Questions

Robotic Paper Wrapping by Learning Force Control

Mar 19, 2025

Hiroki Hanai, Takuya Kiyokawa, Weiwei Wan, Kensuke Harada

Abstract:Robotic packaging using wrapping paper poses significant challenges due to the material's complex deformation properties. The packaging process itself involves multiple steps, primarily categorized as folding the paper or creating creases. Small deviations in the robot's arm trajectory or force vector can lead to tearing or wrinkling of the paper, exacerbated by the variability in material properties. This study introduces a novel framework that combines imitation learning and reinforcement learning to enable a robot to perform each step of the packaging process efficiently. The framework allows the robot to follow approximate trajectories of the tool-center point (TCP) based on human demonstrations while optimizing force control parameters to prevent tearing or wrinkling, even with variable wrapping paper materials. The proposed method was validated through ablation studies, which demonstrated successful task completion with a significant reduction in tear and wrinkle rates. Furthermore, the force control strategy proved to be adaptable across different wrapping paper materials and robust against variations in the size of the target object.

Via

Access Paper or Ask Questions

Adaptive Grasping of Moving Objects in Dense Clutter via Global-to-Local Detection and Static-to-Dynamic Planning

Feb 09, 2025

Hao Chen, Takuya Kiyokawa, Weiwei Wan, Kensuke Harada

Abstract:Robotic grasping is facing a variety of real-world uncertainties caused by non-static object states, unknown object properties, and cluttered object arrangements. The difficulty of grasping increases with the presence of more uncertainties, where commonly used learning-based approaches struggle to perform consistently across varying conditions. In this study, we integrate the idea of similarity matching to tackle the challenge of grasping novel objects that are simultaneously in motion and densely cluttered using a single RGBD camera, where multiple uncertainties coexist. We achieve this by shifting visual detection from global to local states and operating grasp planning from static to dynamic scenes. Notably, we introduce optimization methods to enhance planning efficiency for this time-sensitive task. Our proposed system can adapt to various object types, arrangements and movement speeds without the need for extensive training, as demonstrated by real-world experiments.

* Accepted by ICRA2025

Via

Access Paper or Ask Questions

Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move

Nov 15, 2024

Takuya Kiyokawa, Eiki Nagata, Yoshihisa Tsurumine, Yuhwan Kwon, Takamitsu Matsubara

Figure 1 for Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move

Figure 2 for Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move

Figure 3 for Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move

Figure 4 for Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move

Abstract:Mobile grasping enhances manipulation efficiency by utilizing robots' mobility. This study aims to enable a commercial off-the-shelf robot for mobile grasping, requiring precise timing and pose adjustments. Self-supervised learning can develop a generalizable policy to adjust the robot's velocity and determine grasp position and orientation based on the target object's shape and pose. Due to mobile grasping's complexity, action primitivization and step-by-step learning are crucial to avoid data sparsity in learning from trial and error. This study simplifies mobile grasping into two grasp action primitives and a moving action primitive, which can be operated with limited degrees of freedom for the manipulator. This study introduces three fully convolutional neural network (FCN) models to predict static grasp primitive, dynamic grasp primitive, and residual moving velocity error from visual inputs. A two-stage grasp learning approach facilitates seamless FCN model learning. The ablation study demonstrated that the proposed method achieved the highest grasping accuracy and pick-and-place efficiency. Furthermore, randomizing object shapes and environments in the simulation effectively achieved generalizable mobile grasping.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

Task-Difficulty-Aware Efficient Object Arrangement Leveraging Tossing Motions

Nov 06, 2024

Takuya Kiyokawa, Mahiro Muta, Weiwei Wan, Kensuke Harada

Figure 1 for Task-Difficulty-Aware Efficient Object Arrangement Leveraging Tossing Motions

Figure 2 for Task-Difficulty-Aware Efficient Object Arrangement Leveraging Tossing Motions

Figure 3 for Task-Difficulty-Aware Efficient Object Arrangement Leveraging Tossing Motions

Figure 4 for Task-Difficulty-Aware Efficient Object Arrangement Leveraging Tossing Motions

Abstract:This study explores a pick-and-toss (PT) as an alternative to pick-and-place (PP), allowing a robot to extend its range and improve task efficiency. Although PT boosts efficiency in object arrangement, the placement environment critically affects the success of tossing. To achieve accurate and efficient object arrangement, we suggest choosing between PP and PT based on task difficulty estimated from the placement environment. Our method simultaneously learns the tossing motion through self-supervised learning and the task determination policy via brute-force search. Experimental results validate the proposed method through simulations and real-world tests on various rectangular object arrangements.

* 4 pages, 6 figures

Via

Access Paper or Ask Questions

WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial Networks

Sep 25, 2024

Alberto Bacchin, Leonardo Barcellona, Matteo Terreran, Stefano Ghidoni, Emanuele Menegatti, Takuya Kiyokawa

Abstract:Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git

* Accepted at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

Via

Access Paper or Ask Questions

Component Selection for Craft Assembly Tasks

Jul 19, 2024

Vitor Hideyo Isume, Takuya Kiyokawa, Natsuki Yamanobe, Yukiyasu Domae, Weiwei Wan, Kensuke Harada

Figure 1 for Component Selection for Craft Assembly Tasks

Figure 2 for Component Selection for Craft Assembly Tasks

Figure 3 for Component Selection for Craft Assembly Tasks

Figure 4 for Component Selection for Craft Assembly Tasks

Abstract:Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.

* Submitted to IEEE RA-L

Via

Access Paper or Ask Questions

Many-Objective-Optimized Semi-Automated Robotic Disassembly Sequences

Jan 03, 2024

Takuya Kiyokawa, Kensuke Harada, Weiwei Wan, Tomoki Ishikura, Naoya Miyaji, Genichiro Matsuda

Abstract:This study tasckles the problem of many-objective sequence optimization for semi-automated robotic disassembly operations. To this end, we employ a many-objective genetic algorithm (MaOGA) algorithm inspired by the Non-dominated Sorting Genetic Algorithm (NSGA)-III, along with robotic-disassembly-oriented constraints and objective functions derived from geometrical and robot simulations using 3-dimensional (3D) geometrical information stored in a 3D Computer-Aided Design (CAD) model of the target product. The MaOGA begins by generating a set of initial chromosomes based on a contact and connection graph (CCG), rather than random chromosomes, to avoid falling into a local minimum and yield repeatable convergence. The optimization imposes constraints on feasibility and stability as well as objective functions regarding difficulty, efficiency, prioritization, and allocability to generate a sequence that satisfies many preferred conditions under mandatory requirements for semi-automated robotic disassembly. The NSGA-III-inspired MaOGA also utilizes non-dominated sorting and niching with reference lines to further encourage steady and stable exploration and uniformly lower the overall evaluation values. Our sequence generation experiments for a complex product (36 parts) demonstrated that the proposed method can consistently produce feasible and stable sequences with a 100% success rate, bringing the multiple preferred conditions closer to the optimal solution required for semi-automated robotic disassembly operations.

Via

Access Paper or Ask Questions

Probabilistic Slide-support Manipulation Planning in Clutter

Jun 22, 2023

Shusei Nagato, Tomohiro Motoda, Takao Nishi, Petit Damien, Takuya Kiyokawa, Weiwei Wan, Kensuke Harada

Figure 1 for Probabilistic Slide-support Manipulation Planning in Clutter

Figure 2 for Probabilistic Slide-support Manipulation Planning in Clutter

Figure 3 for Probabilistic Slide-support Manipulation Planning in Clutter

Figure 4 for Probabilistic Slide-support Manipulation Planning in Clutter

Abstract:To safely and efficiently extract an object from the clutter, this paper presents a bimanual manipulation planner in which one hand of the robot is used to slide the target object out of the clutter while the other hand is used to support the surrounding objects to prevent the clutter from collapsing. Our method uses a neural network to predict the physical phenomena of the clutter when the target object is moved. We generate the most efficient action based on the Monte Carlo tree search.The grasping and sliding actions are planned to minimize the number of motion sequences to pick the target object. In addition, the object to be supported is determined to minimize the position change of surrounding objects. Experiments with a real bimanual robot confirmed that the robot could retrieve the target object, reducing the total number of motion sequences and improving safety.

* IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023) (Accepted)

Via

Access Paper or Ask Questions

Bounding Box Annotation with Visible Status

Apr 11, 2023

Takuya Kiyokawa, Naoki Shirakura, Hiroki Katayama, Keita Tomochika, Jun Takamatsu

Abstract:Training deep-learning-based vision systems requires the manual annotation of a significant amount of data to optimize several parameters of the deep convolutional neural networks. Such manual annotation is highly time-consuming and labor-intensive. To reduce this burden, a previous study presented a fully automated annotation approach that does not require any manual intervention. The proposed method associates a visual marker with an object and captures it in the same image. However, because the previous method relied on moving the object within the capturing range using a fixed-point camera, the collected image dataset was limited in terms of capturing viewpoints. To overcome this limitation, this study presents a mobile application-based free-viewpoint image-capturing method. With the proposed application, users can collect multi-view image datasets automatically that are annotated with bounding boxes by moving the camera. However, capturing images through human involvement is laborious and monotonous. Therefore, we propose gamified application features to track the progress of the collection status. Our experiments demonstrated that using the gamified mobile application for bounding box annotation, with visible collection progress status, can motivate users to collect multi-view object image datasets with less mental workload and time pressure in an enjoyable manner, leading to increased engagement.

* 10 pages, 16 figures

Via

Access Paper or Ask Questions