Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Sep 09, 2024

Muraleekrishna Gopinathan, Martin Masek, Jumana Abu-Khalaf, David Suter

Figure 1 for Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Figure 2 for Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Figure 3 for Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Figure 4 for Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Share this with someone who'll enjoy it:

Abstract:Embodied AI aims to develop robots that can \textit{understand} and execute human language instructions, as well as communicate in natural languages. On this front, we study the task of generating highly detailed navigational instructions for the embodied robots to follow. Although recent studies have demonstrated significant leaps in the generation of step-by-step instructions from sequences of images, the generated instructions lack variety in terms of their referral to objects and landmarks. Existing speaker models learn strategies to evade the evaluation metrics and obtain higher scores even for low-quality sentences. In this work, we propose SAS (Spatially-Aware Speaker), an instruction generator or \textit{Speaker} model that utilises both structural and semantic knowledge of the environment to produce richer instructions. For training, we employ a reward learning method in an adversarial setting to avoid systematic bias introduced by language evaluation metrics. Empirically, our method outperforms existing instruction generation models, evaluated using standard metrics. Our code is available at \url{https://github.com/gmuraleekrishna/SAS}.

View paper on

Share this with someone who'll enjoy it:

Title:Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation

Paper and Code