Abstract:Autonomous agents capable of diverse object manipulations should be able to acquire a wide range of manipulation skills with high reusability. Although advances in deep learning have made it increasingly feasible to replicate the dexterity of human teleoperation in robots, generalizing these acquired skills to previously unseen scenarios remains a significant challenge. In this study, we propose a novel algorithm, Gaze-based Bottleneck-aware Robot Manipulation (GazeBot), which enables high reusability of the learned motions even when the object positions and end-effector poses differ from those in the provided demonstrations. By leveraging gaze information and motion bottlenecks, both crucial features for object manipulation, GazeBot achieves high generalization performance compared with state-of-the-art imitation learning methods, without sacrificing its dexterity and reactivity. Furthermore, the training process of GazeBot is entirely data-driven once a demonstration dataset with gaze data is provided. Videos and code are available at https://crumbyrobotics.github.io/gazebot.
Abstract:In imitation learning for robotic manipulation, decomposing object manipulation tasks into multiple semantic actions is essential. This decomposition enables the reuse of learned skills in varying contexts and the combination of acquired skills to perform novel tasks, rather than merely replicating demonstrated motions. Gaze, an evolutionary tool for understanding ongoing events, plays a critical role in human object manipulation, where it strongly correlates with motion planning. In this study, we propose a simple yet robust task decomposition method based on gaze transitions. We hypothesize that an imitation agent's gaze control, fixating on specific landmarks and transitioning between them, naturally segments demonstrated manipulations into sub-tasks. Notably, our method achieves consistent task decomposition across all demonstrations, which is desirable in contexts such as machine learning. Using teleoperation, a common modality in imitation learning for robotic manipulation, we collected demonstration data for various tasks, applied our segmentation method, and evaluated the characteristics and consistency of the resulting sub-tasks. Furthermore, through extensive testing across a wide range of hyperparameter variations, we demonstrated that the proposed method possesses the robustness necessary for application to different robotic systems.