Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takayuki Kanai

Where Do We Look When We Teach? Analyzing Human Gaze Behavior Across Demonstration Devices in Robot Imitation Learning

Jun 06, 2025

Yutaro Ishida, Takamitsu Matsubara, Takayuki Kanai, Kazuhiro Shintani, Hiroshi Bito

Abstract:Imitation learning for acquiring generalizable policies often requires a large volume of demonstration data, making the process significantly costly. One promising strategy to address this challenge is to leverage the cognitive and decision-making skills of human demonstrators with strong generalization capability, particularly by extracting task-relevant cues from their gaze behavior. However, imitation learning typically involves humans collecting data using demonstration devices that emulate a robot's embodiment and visual condition. This raises the question of how such devices influence gaze behavior. We propose an experimental framework that systematically analyzes demonstrators' gaze behavior across a spectrum of demonstration devices. Our experimental results indicate that devices emulating (1) a robot's embodiment or (2) visual condition impair demonstrators' capability to extract task-relevant cues via gaze behavior, with the extent of impairment depending on the degree of emulation. Additionally, gaze data collected using devices that capture natural human behavior improves the policy's task success rate from 18.8% to 68.8% under environmental shifts.

Via

Access Paper or Ask Questions

Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Oct 02, 2024

Yutaro Ishida, Yuki Noguchi, Takayuki Kanai, Kazuhiro Shintani, Hiroshi Bito

Figure 1 for Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Figure 2 for Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Figure 3 for Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Figure 4 for Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Abstract:We study how to generalize the visuomotor policy of a mobile manipulator from the perspective of visual observations. The mobile manipulator is prone to occlusion owing to its own body when only a single viewpoint is employed and a significant domain shift when deployed in diverse situations. However, to the best of the authors' knowledge, no study has been able to solve occlusion and domain shift simultaneously and propose a robust policy. In this paper, we propose a robust imitation learning method for mobile manipulators that focuses on task-related viewpoints and their spatial regions when observing multiple viewpoints. The multiple viewpoint policy includes attention mechanism, which is learned with an augmented dataset, and brings optimal viewpoints and robust visual embedding against occlusion and domain shift. Comparison of our results for different tasks and environments with those of previous studies revealed that our proposed method improves the success rate by up to 29.3 points. We also conduct ablation studies using our proposed method. Learning task-related viewpoints from the multiple viewpoints dataset increases robustness to occlusion than using a uniquely defined viewpoint. Focusing on task-related regions contributes to up to a 33.3-point improvement in the success rate against domain shift.

Via

Access Paper or Ask Questions

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Jun 03, 2024

Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Kazuhiro Shintani

Abstract:Monocular visual odometry is a key technology in a wide variety of autonomous systems. Relative to traditional feature-based methods, that suffer from failures due to poor lighting, insufficient texture, large motions, etc., recent learning-based SLAM methods exploit iterative dense bundle adjustment to address such failure cases and achieve robust accurate localization in a wide variety of real environments, without depending on domain-specific training data. However, despite its potential, learning-based SLAM still struggles with scenarios involving large motion and object dynamics. In this paper, we diagnose key weaknesses in a popular learning-based SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark. Code and pre-trained models will be released upon publication.

* 8 pages. 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Robust Self-Supervised Extrinsic Self-Calibration

Aug 07, 2023

Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Adrien Gaidon, Rares Ambrus

Figure 1 for Robust Self-Supervised Extrinsic Self-Calibration

Figure 2 for Robust Self-Supervised Extrinsic Self-Calibration

Figure 3 for Robust Self-Supervised Extrinsic Self-Calibration

Figure 4 for Robust Self-Supervised Extrinsic Self-Calibration

Abstract:Autonomous vehicles and robots need to operate over a wide variety of scenarios in order to complete tasks efficiently and safely. Multi-camera self-supervised monocular depth estimation from videos is a promising way to reason about the environment, as it generates metrically scaled geometric predictions from visual data without requiring additional sensors. However, most works assume well-calibrated extrinsics to fully leverage this multi-camera setup, even though accurate and efficient calibration is still a challenging problem. In this work, we introduce a novel method for extrinsic calibration that builds upon the principles of self-supervised monocular depth and ego-motion learning. Our proposed curriculum learning strategy uses monocular depth and pose estimators with velocity supervision to estimate extrinsics, and then jointly learns extrinsic calibration along with depth and pose for a set of overlapping cameras rigidly attached to a moving vehicle. Experiments on a benchmark multi-camera dataset (DDAD) demonstrate that our method enables self-calibration in various scenes robustly and efficiently compared to a traditional vision-based pose estimation pipeline. Furthermore, we demonstrate the benefits of extrinsics self-calibration as a way to improve depth prediction via joint optimization.

* The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
* Project page: https://sites.google.com/view/tri-sesc

Via

Access Paper or Ask Questions

Third-party Evaluation of Robotic Hand Designs Using a Mechanical Glove

Sep 22, 2021

Takayuki Kanai, Yoshiyuki Ohmura, Akihiko Nagakubo, Yasuo Kuniyoshi

Figure 1 for Third-party Evaluation of Robotic Hand Designs Using a Mechanical Glove

Figure 2 for Third-party Evaluation of Robotic Hand Designs Using a Mechanical Glove

Figure 3 for Third-party Evaluation of Robotic Hand Designs Using a Mechanical Glove

Figure 4 for Third-party Evaluation of Robotic Hand Designs Using a Mechanical Glove

Abstract:A robotic hand design suitable for dexterity should be examined using functional tests. To achieve this, we designed a mechanical glove, which is a rigid wearable glove that enables us to develop the corresponding isomorphic robotic hand and evaluate its hardware properties. Subsequently, the effectiveness of multiple degrees-of-freedom (DOFs) was evaluated by human participants. Several fine motor skills were evaluated using the mechanical glove under two conditions: one- and three-DOF conditions. To the best of our knowledge, this is the first extensive evaluation method for robotic hand designs suitable for dexterity. (This paper was peer-reviewed and is the full translation from the Journal of the Robotics Society of Japan, Vol.39, No.6, pp.557-560, 2021.)

* Journal of the Robotics Society of Japan, Vol.39, No.6, pp.557-560, 2021
* 5 pages, 7 figures

Via

Access Paper or Ask Questions