Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Apr 21, 2023

Hongcheng Wang, Yuxuan Wang, Fangwei Zhong, Mingdong Wu, Jianwei Zhang, Yizhou Wang, Hao Dong

Figure 1 for Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Figure 2 for Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Figure 3 for Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Figure 4 for Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Share this with someone who'll enjoy it:

Abstract:Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.

View paper on

Share this with someone who'll enjoy it:

Title:Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Paper and Code