Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Frank Guan

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

Mar 17, 2025

Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham

Abstract:Most image-based 3D object reconstructors assume that objects are fully visible, ignoring occlusions that commonly occur in real-world scenarios. In this paper, we introduce Amodal3R, a conditional 3D generative model designed to reconstruct 3D objects from partial observations. We start from a "foundation" 3D generative model and extend it to recover plausible 3D geometry and appearance from occluded objects. We introduce a mask-weighted multi-head cross-attention mechanism followed by an occlusion-aware attention layer that explicitly leverages occlusion priors to guide the reconstruction process. We demonstrate that, by training solely on synthetic data, Amodal3R learns to recover full 3D objects even in the presence of occlusions in real scenes. It substantially outperforms existing methods that independently perform 2D amodal completion followed by 3D reconstruction, thereby establishing a new benchmark for occlusion-aware 3D reconstruction.

* Project Page: https://sm0kywu.github.io/Amodal3R/

Via

Access Paper or Ask Questions

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Nov 27, 2024

Dillon Loh, Tomasz Bednarz, Xinxing Xia, Frank Guan

Figure 1 for AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Figure 2 for AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Figure 3 for AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Figure 4 for AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Abstract:Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.

Via

Access Paper or Ask Questions