Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Sep 09, 2024

Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong

Figure 1 for Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Figure 2 for Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Figure 3 for Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Figure 4 for Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Share this with someone who'll enjoy it:

Abstract:Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN). Despite remarkable recent progress in learning fine-grained and multifarious visual representations, the tendency to overfit to the training environments leads to unsatisfactory generalization performance. In this work, we present a versatile Multi-Branch Architecture (MBA) aimed at exploring and exploiting diverse visual inputs. Specifically, we introduce three distinct visual variants: ground-truth depth images, visual inputs integrated with incongruent views, and those infused with random noise to enrich the diversity of visual input representation and prevent overfitting to the original RGB observations. To adaptively fuse these varied inputs, the proposed MBA extend a base agent model into a multi-branch variant, where each branch processes a different visual input. Surprisingly, even random noise can further enhance navigation performance in unseen environments. Extensive experiments conducted on three VLN benchmarks (R2R, REVERIE, SOON) demonstrate that our proposed method equals or even surpasses state-of-the-art results. The source code will be publicly available.

* 5 pages, 2 figures, submitted to ICASSP 2025

View paper on

Share this with someone who'll enjoy it:

Title:Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Paper and Code