Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yishu Li

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Nov 02, 2024

Wang Zhao, Jiachen Liu, Sheng Zhang, Yishu Li, Sili Chen, Sharon X Huang, Yong-Jin Liu, Hengkai Guo

Figure 1 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 2 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 3 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Figure 4 for MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Abstract:This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane .

* IROS 2024 (oral)

Via

Access Paper or Ask Questions

FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

Oct 09, 2024

Yishu Li, Wen Hui Leng, Yiming Fang, Ben Eisner, David Held

Abstract:We introduce a novel approach to manipulate articulated objects with ambiguities, such as opening a door, in which multi-modality and occlusions create ambiguities about the opening side and direction. Multi-modality occurs when the method to open a fully closed door (push, pull, slide) is uncertain, or the side from which it should be opened is uncertain. Occlusions further obscure the door's shape from certain angles, creating further ambiguities during the occlusion. To tackle these challenges, we propose a history-aware diffusion network that models the multi-modal distribution of the articulated object and uses history to disambiguate actions and make stable predictions under occlusions. Experiments and analysis demonstrate the state-of-art performance of our method and specifically improvements in ambiguity-caused failure modes. Our project website is available at https://flowbothd.github.io/.

* Accepted to CoRL 2024

Via

Access Paper or Ask Questions