Abstract:Traditional robot navigation systems primarily utilize occupancy grid maps and laser-based sensing technologies, as demonstrated by the popular move_base package in ROS. Unlike robots, humans navigate not only through spatial awareness and physical distances but also by integrating external information, such as elevator maintenance updates from public notification boards and experiential knowledge, like the need for special access through certain doors. With the development of Large Language Models (LLMs), which posses text understanding and intelligence close to human performance, there is now an opportunity to infuse robot navigation systems with a level of understanding akin to human cognition. In this study, we propose using osmAG (Area Graph in OpensStreetMap textual format), an innovative semantic topometric hierarchical map representation, to bridge the gap between the capabilities of ROS move_base and the contextual understanding offered by LLMs. Our methodology employs LLMs as actual copilot in robot navigation, enabling the integration of a broader range of informational inputs while maintaining the robustness of traditional robotic navigation systems. Our code, demo, map, experiment results can be accessed at https://github.com/xiexiexiaoxiexie/Intelligent-LiDAR-Navigation-LLM-as-Copilot.
Abstract:Recently, Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge for situations that can not be pre-programmed beforehand. Generally speaking, mobile robots need to understand maps to execute tasks such as localization or navigation. In this letter, we address the problem of enabling LLMs to comprehend Area Graph, a text-based map representation, in order to enhance their applicability in the field of mobile robotics. Area Graph is a hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms, corridors or buildings. In contrast to commonly used map representations, such as occupancy grid maps or point clouds, osmAG (Area Graph in OpensStreetMap format) is stored in a XML textual format naturally readable by LLMs. Furthermore, conventional robotic algorithms such as localization and path planning are compatible with osmAG, facilitating this map representation comprehensible by LLMs, traditional robotic algorithms and humans. Our experiments show that with a proper map representation, LLMs possess the capability to understand maps and answer queries based on that understanding. Following simple fine-tuning of LLaMA2 models, it surpassed ChatGPT-3.5 in tasks involving topology and hierarchy understanding. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.
Abstract:Lifelong indoor localization in a given map is the basis for navigation of autonomous mobile robots. In this letter, we address the problem of robust localization in cluttered indoor environments like office spaces and corridors using 3D LiDAR point clouds in a given Area Graph, which is a hierarchical, topometric semantic map representation that uses polygons to demark areas such as rooms, corridors or buildings. This representation is very compact, can represent different floors of buildings through its hierarchy and provides semantic information that helps with localization, like poses of doors and glass. In contrast to this, commonly used map representations, such as occupancy grid maps or point clouds, lack these features and require frequent updates in response to environmental changes (e.g. moved furniture), unlike our approach, which matches against lifelong architectural features such as walls and doors. For that we apply filtering to remove clutter from the 3D input point cloud and then employ further scoring and weight functions for localization. Given a broad initial guess from WiFi localization, our experiments show that our global localization and the weighted point to line ICP pose tracking perform very well, even when compared to localization and SLAM algorithms that use the current, feature-rich cluttered map for localization.