Abstract:The generalization of the end-to-end deep reinforcement learning (DRL) for object-goal visual navigation is a long-standing challenge since object classes and placements vary in new test environments. Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects. In this letter, a target-directed attention network (TDANet) is proposed to learn the end-to-end object-goal visual navigation policy with zero-shot ability. TDANet features a novel target attention (TA) module that learns both the spatial and semantic relationships among objects to help TDANet focus on the most relevant observed objects to the target. With the Siamese architecture (SA) design, TDANet distinguishes the difference between the current and target states and generates the domain-independent visual representation. To evaluate the navigation performance of TDANet, extensive experiments are conducted in the AI2-THOR embodied AI environment. The simulation results demonstrate a strong generalization ability of TDANet to unseen scenes and target objects, with higher navigation success rate (SR) and success weighted by length (SPL) than other state-of-the-art models.
Abstract:Autonomous navigation in unknown environments without a global map is a long-standing challenge for mobile robots. While deep reinforcement learning (DRL) has attracted a rapidly growing interest in solving such an autonomous navigation problem for its generalization capability, DRL typically leads to a mediocre navigation performance in practice due to the gap between the training scene and the actual test scene. Most existing work focuses on tuning the algorithm to enhance its transferability, whereas few investigates how to quantify or measure the gap therebetween. This letter presents a local map-based deep Q-network (DQN) navigation algorithm, which uses local maps converted from 2D LiDAR data as observations without a global map. More importantly, this letter proposes a new transferability metric -- the scene similarity calculated from an improved image template matching algorithm to measure the similarity between the training and test scenes. With a wheeled robot as the case study platform, both simulation and real-world experiments are conducted in a total of 20 different scenes. The case study results successfully validate the local map-based navigation algorithm as well as the similarity metric in predicting the transferability or success rate of the algorithm.