Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Koichi Nishiwaki

ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping

Apr 15, 2025

Shun Iwase, Zubair Irshad, Katherine Liu, Vitor Guizilini, Robert Lee, Takuya Ikeda, Ayako Amma, Koichi Nishiwaki, Kris Kitani, Rares Ambrus(+1 more)

Abstract:Robotic grasping is a cornerstone capability of embodied systems. Many methods directly output grasps from partial information without modeling the geometry of the scene, leading to suboptimal motion and even collisions. To address these issues, we introduce ZeroGrasp, a novel framework that simultaneously performs 3D reconstruction and grasp pose prediction in near real-time. A key insight of our method is that occlusion reasoning and modeling the spatial relationships between objects is beneficial for both accurate reconstruction and grasping. We couple our method with a novel large-scale synthetic dataset, which comprises 1M photo-realistic images, high-resolution 3D reconstructions and 11.3B physically-valid grasp pose annotations for 12K objects from the Objaverse-LVIS dataset. We evaluate ZeroGrasp on the GraspNet-1B benchmark as well as through real-world robot experiments. ZeroGrasp achieves state-of-the-art performance and generalizes to novel real-world objects by leveraging synthetic data.

* Published at CVPR 2025, Webpage: https://sh8.io/#/zerograsp

Via

Access Paper or Ask Questions

A Planar-Symmetric SO(3) Representation for Learning Grasp Detection

Oct 07, 2024

Tianyi Ko, Takuya Ikeda, Hiroya Sato, Koichi Nishiwaki

Abstract:Planar-symmetric hands, such as parallel grippers, are widely adopted in both research and industrial fields. Their symmetry, however, introduces ambiguity and discontinuity in the SO(3) representation, which hinders both the training and inference of neural-network-based grasp detectors. We propose a novel SO(3) representation that can parametrize a pair of planar-symmetric poses with a single parameter set by leveraging the 2D Bingham distribution. We also detail a grasp detector based on our representation, which provides a more consistent rotation output. An intensive evaluation with multiple grippers and objects in both the simulation and the real world quantitatively shows our approach's contribution.

* Accepted by CoRL2024

Via

Access Paper or Ask Questions

ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion

Apr 15, 2024

Tianhan Xu, Takuya Ikeda, Koichi Nishiwaki

$Figure 1 for ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion$

$Figure 2 for ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion$

$Figure 3 for ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion$

$Figure 4 for ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion$

Abstract:In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps. Recent works have used neural radiance fields to model 3D scenes and improved the quality of novel view synthesis, while few studies have focused on modeling the invisible or occluded parts of the training images. These under-reconstruction parts constrain both scene editing and rendering view selection, thereby limiting their utility for synthetic data generation for downstream tasks. Our basic idea is that, by observing the same set of objects in various arrangement, so that parts that are invisible in one scene may become visible in others. By fusing the visible parts from each scene, occlusion-free rendering of both background and foreground objects can be achieved. We decompose the multi-scene fusion task into two main components: (1) objects/background segmentation and alignment, where we leverage point cloud-based methods tailored to our novel problem formulation; (2) radiance fields fusion, where we introduce visibility field to quantify the visible information of radiance fields, and propose visibility-aware rendering for the fusion of series of scenes, ultimately obtaining clean background and 360$^\circ$ object rendering. Comprehensive experiments were conducted on synthetic and real datasets, and the results demonstrate the effectiveness of our method.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

Feb 20, 2024

Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki

Figure 1 for DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

Figure 2 for DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

Figure 3 for DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

Figure 4 for DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

Abstract:This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes as well as establishing correspondences essential for pose estimation. Furthermore, we introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations. We demonstrate the effectiveness of our method by testing it on a range of real datasets. Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities, outperforming baselines, even those specifically trained on the target domain.

* 9 pages, 9 figures

Via

Access Paper or Ask Questions

Gravity-aware Grasp Generation with Implicit Grasp Mode Selection for Underactuated Hands

Dec 19, 2023

Tianyi Ko, Takuya Ikeda, Thomas Stewart, Robert Lee, Koichi Nishiwaki

Abstract:To overcome the mechanical limitation of parallel-jaw grippers, in this paper, we present a gravity-aware grasp generation that supports both precision grasp and power grasp of underactuated hands. We propose a novel approach to generate a large-scale dataset with a gravity-rejection score and experimentally confirm that the combination of that score and classical success/fail binary classification is powerful: the former encourages stable grasps, such as power grasps or grasping the center of mass, while the latter rejects invalid grasps, such as colliding with other objects or attempting to grasp parts that are too large for the gripper. We also propose a rotation representation that is continuous on SO(3) and considers the grasp's physical meaning. Our simulation and real robot evaluation experiments demonstrate significant improvements from the baseline works, especially for heavy objects.

Via

Access Paper or Ask Questions

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

Nov 23, 2023

Pengyuan Wang, Takuya Ikeda, Robert Lee, Koichi Nishiwaki

Abstract:Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.

Via

Access Paper or Ask Questions

A Probabilistic Rotation Representation for Symmetric Shapes With an Efficiently Computable Bingham Loss Function

May 30, 2023

Hiroya Sato, Takuya Ikeda, Koichi Nishiwaki

Abstract:In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An alternative easy-to-implement loss function has been proposed to avoid complex computations but has difficulty expressing symmetric distribution. In this paper, we introduce a fast-computable and easy-to-implement NLL loss function for Bingham distribution. We also create the inference network and show that our loss function can capture the symmetric property of target objects from their point clouds.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: substantial text overlap with arXiv:2203.04456

Via

Access Paper or Ask Questions

Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

May 25, 2022

Tianyi Ko, Koichi Nishiwaki, Koji Terada, Yusuke Tanaka, Shun Mitsumata, Ryuichi Katagiri, Taketo Junko, Naoshi Horiba, Hideyoshi Igata, Kazue Mizuno

Figure 1 for Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

Figure 2 for Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

Figure 3 for Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

Figure 4 for Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

Abstract:In this paper, we present a robotic device for mouse tail vein injection. We propose a mouse holding mechanism to realize vein injection without anesthetizing the mouse, which consists of a tourniquet, vacuum port, and adaptive tail-end fixture. The position of the target vein in 3D space is reconstructed from a high-resolution stereo vision. The vein is detected by a simple but robust vein line detector. Thanks to the proposed two-staged calibration process, the total time for the injection process is limited to 1.5 minutes, despite that the position of needle and tail vein varies for each trial. We performed an injection experiment targeting 40 mice and succeeded to inject saline to 37 of them, resulting 92.5% success ratio.

* accepted to ICRA2022 (7 pages, 11 figures, 2 tables)

Via

Access Paper or Ask Questions

Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Mar 09, 2022

Hiroya Sato, Takuya Ikeda, Koichi Nishiwaki

Figure 1 for Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Figure 2 for Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Figure 3 for Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Figure 4 for Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Abstract:In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to the ambiguity representation. However, it requires the complex computation of the normalizing constants. This is the bottleneck of loss computation in training neural networks based on Bingham representation. As such, we propose a fast-computable and easy-to-implement loss function for Bingham distribution. We also show not only to examine the parametrization of Bingham distribution but also an application based on our loss function.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Mar 03, 2022

Takuya Ikeda, Suomi Tanishige, Ayako Amma, Michael Sudano, Hervé Audren, Koichi Nishiwaki

Figure 1 for Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Figure 2 for Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Figure 3 for Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Figure 4 for Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Abstract:In recent years, synthetic data has been widely used in the training of 6D pose estimation networks, in part because it automatically provides perfect annotation at low cost. However, there are still non-trivial domain gaps, such as differences in textures/materials, between synthetic and real data. These gaps have a measurable impact on performance. To solve this problem, we introduce a simulation to reality (sim2real) instance-level style transfer for 6D pose estimation network training. Our approach transfers the style of target objects individually, from synthetic to real, without human intervention. This improves the quality of synthetic data for training pose estimation networks. We also propose a complete pipeline from data collection to the training of a pose estimation network and conduct extensive evaluation on a real-world robotic platform. Our evaluation shows significant improvement achieved by our method in both pose estimation performance and the realism of images adapted by the style transfer.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions