Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Mar 08, 2022

Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

Figure 1 for Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Figure 2 for Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Figure 3 for Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Figure 4 for Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Share this with someone who'll enjoy it:

Abstract:Vision-language navigation (VLN) is a challenging task due to its large searching space in the environment. To address this problem, previous works have proposed some methods of fine-tuning a large model that pretrained on large-scale datasets. However, the conventional fine-tuning methods require extra human-labeled navigation data and lack self-exploration capabilities in environments, which hinders their generalization of unseen scenes. To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP). Our method fully utilizes the knowledge learned from CLIP to build an in-domain dataset by self-exploration without human labeling. Unlike the conventional approach of fine-tuning, we introduce prompt-based learning to achieve fast adaptation for language embeddings, which substantially improves the learning efficiency by leveraging prior knowledge. By automatically synthesizing trajectory-instruction pairs in any environment without human supervision and efficient prompt-based learning, our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE. Both qualitative and quantitative results show that our ProbES significantly improves the generalization ability of the navigation model.

* Accepted by ACL 2022

View paper on

Share this with someone who'll enjoy it:

Title:Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Paper and Code