Abstract:Following human instructions to explore and search for a specified target in an unfamiliar environment is a crucial skill for mobile service robots. Most of the previous works on object goal navigation have typically focused on a single input modality as the target, which may lead to limited consideration of language descriptions containing detailed attributes and spatial relationships. To address this limitation, we propose VLN-Game, a novel zero-shot framework for visual target navigation that can process object names and descriptive language targets effectively. To be more precise, our approach constructs a 3D object-centric spatial map by integrating pre-trained visual-language features with a 3D reconstruction of the physical environment. Then, the framework identifies the most promising areas to explore in search of potential target candidates. A game-theoretic vision language model is employed to determine which target best matches the given language description. Experiments conducted on the Habitat-Matterport 3D (HM3D) dataset demonstrate that the proposed framework achieves state-of-the-art performance in both object goal navigation and language-based navigation tasks. Moreover, we show that VLN-Game can be easily deployed on real-world robots. The success of VLN-Game highlights the promising potential of using game-theoretic methods with compact vision-language models to advance decision-making capabilities in robotic systems. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/vln-game.
Abstract:Navigating robots discreetly in human work environments while considering the possible privacy implications of robotic tasks presents significant challenges. Such scenarios are increasingly common, for instance, when robots transport sensitive objects that demand high levels of privacy in spaces crowded with human activities. While extensive research has been conducted on robotic path planning and social awareness, current robotic systems still lack the functionality of privacy-aware navigation in public environments. To address this, we propose a new framework for mobile robot navigation that leverages vision-language models to incorporate privacy awareness into adaptive path planning. Specifically, all potential paths from the starting point to the destination are generated using the A* algorithm. Concurrently, the vision-language model is used to infer the optimal path for privacy-awareness, given the environmental layout and the navigational instruction. This approach aims to minimize the robot's exposure to human activities and preserve the privacy of the robot and its surroundings. Experimental results on the S3DIS dataset demonstrate that our framework significantly enhances mobile robots' privacy awareness of navigation in human-shared public environments. Furthermore, we demonstrate the practical applicability of our framework by successfully navigating a robotic platform through real-world office environments. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/privacy-aware-nav.
Abstract:In this work, we propose a high-order regularization method to solve the ill-conditioned problems in robot localization. Numerical solutions to robot localization problems are often unstable when the problems are ill-conditioned. A typical way to solve ill-conditioned problems is regularization, and a classical regularization method is the Tikhonov regularization. It is shown that the Tikhonov regularization can be seen as a low-order case of our method. We find that the proposed method is superior to the Tikhonov regularization in approximating some ill-conditioned inverse problems, such as robot localization problems. The proposed method overcomes the over-smoothing problem in the Tikhonov regularization as it can use more than one term in the approximation of the matrix inverse, and an explanation for the over-smoothing of the Tikhonov regularization is given. Moreover, one a priori criterion which improves the numerical stability of the ill-conditioned problem is proposed to obtain an optimal regularization matrix. As most of the regularization solutions are biased, we also provide two bias-correction techniques for the proposed high-order regularization. The simulation and experiment results using a sensor network in a 3D environment are discussed, demonstrating the performance of the proposed method.
Abstract:We design a distributed coordinated guiding vector field (CGVF) for a group of robots to achieve ordering-flexible motion coordination while maneuvering on a desired two-dimensional (2D) surface. The CGVF is characterized by three terms, i.e., a convergence term to drive the robots to converge to the desired surface, a propagation term to provide a traversing direction for maneuvering on the desired surface, and a coordinated term to achieve the surface motion coordination with an arbitrary ordering of the robotic group. By setting the surface parameters as additional virtual coordinates, the proposed approach eliminates the potential singularity of the CGVF and enables both the global convergence to the desired surface and the maneuvering on the surface from all possible initial conditions. The ordering-flexible surface motion coordination is realized by each robot to share with its neighbors only two virtual coordinates, i.e. that of a given target and that of its own, which reduces the communication and computation cost in multi-robot surface navigation. Finally, the effectiveness of the CGVF is substantiated by extensive numerical simulations.
Abstract:In this paper, we propose a distributed guiding-vector-field (DGVF) algorithm for a team of robots to form a spontaneous-ordering platoon moving along a predefined desired path in the n-dimensional Euclidean space. Particularly, by adding a path parameter as an additional virtual coordinate to each robot, the DGVF algorithm can eliminate the singular points where the vector fields vanish, and govern robots to approach a closed and even self-intersecting desired path. Then, the interactions among neighboring robots and a virtual target robot through their virtual coordinates enable the realization of the desired platoon; in particular, relative parametric displacements can be achieved with arbitrary ordering sequences. Rigorous analysis is provided to guarantee the global convergence to the spontaneous-ordering platoon on the common desired path from any initial positions. 2D experiments using three HUSTER-0.3 unmanned surface vessels (USVs) are conducted to validate the practical effectiveness of the proposed DGVF algorithm, and 3D numerical simulations are presented to demonstrate its effectiveness and robustness when tackling higher-dimensional multi-robot path-navigation missions and some robots breakdown.
Abstract:In advanced human-robot interaction tasks, visual target navigation is crucial for autonomous robots navigating unknown environments. While numerous approaches have been developed in the past, most are designed for single-robot operations, which often suffer from reduced efficiency and robustness due to environmental complexities. Furthermore, learning policies for multi-robot collaboration are resource-intensive. To address these challenges, we propose Co-NavGPT, an innovative framework that integrates Large Language Models (LLMs) as a global planner for multi-robot cooperative visual target navigation. Co-NavGPT encodes the explored environment data into prompts, enhancing LLMs' scene comprehension. It then assigns exploration frontiers to each robot for efficient target search. Experimental results on Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT surpasses existing models in success rates and efficiency without any learning process, demonstrating the vast potential of LLMs in multi-robot collaboration domains. The supplementary video, prompts, and code can be accessed via the following link: \href{https://sites.google.com/view/co-navgpt}{https://sites.google.com/view/co-navgpt}.
Abstract:Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.
Abstract:This work focuses on the problem of visual target navigation, which is very important for autonomous robots as it is closely related to high-level tasks. To find a special object in unknown environments, classical and learning-based approaches are fundamental components of navigation that have been investigated thoroughly in the past. However, due to the difficulty in the representation of complicated scenes and the learning of the navigation policy, previous methods are still not adequate, especially for large unknown scenes. Hence, we propose a novel framework for visual target navigation using the frontier semantic policy. In this proposed framework, the semantic map and the frontier map are built from the current observation of the environment. Using the features of the maps and object category, deep reinforcement learning enables to learn a frontier semantic policy which can be used to select a frontier cell as a long-term goal to explore the environment efficiently. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and efficiency. Ablation analysis also indicates that the proposed approach learns a more efficient exploration policy based on the frontiers. A demonstration is provided to verify the applicability of applying our model to real-world transfer. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/fsevn.
Abstract:Traditional evaluation metrics for classification in natural language processing such as accuracy and area under the curve fail to differentiate between models with different predictive behaviors despite their similar performance metrics. We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level to provide insights into disparities in their decision-making logic. We assess the sensitivity score on a set of representative words in the test set using two classifiers trained for hospital readmission classification with similar performance statistics. Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores. The results indicate that the language model's sensitivity score aligns better with the professionals than the xgboost classifier on tf-idf embeddings, which suggests that xgboost uses some spurious features. Overall, this metric offers a novel perspective on assessing models' robustness by quantifying their discrepancy with professional opinions. Our code is available on GitHub (https://github.com/nyuolab/Model_Sensitivity).
Abstract:We propose coordinating guiding vector fields to achieve two tasks simultaneously with a team of robots: first, the guidance and navigation of multiple robots to possibly different paths or surfaces typically embedded in 2D or 3D; second, their motion coordination while tracking their prescribed paths or surfaces. The motion coordination is defined by desired parametric displacements between robots on the path or surface. Such a desired displacement is achieved by controlling the virtual coordinates, which correspond to the path or surface's parameters, between guiding vector fields. Rigorous mathematical guarantees underpinned by dynamical systems theory and Lyapunov theory are provided for the effective distributed motion coordination and navigation of robots on paths or surfaces from all initial positions. As an example for practical robotic applications, we derive a control algorithm from the proposed coordinating guiding vector fields for a Dubins-car-like model with actuation saturation. Our proposed algorithm is distributed and scalable to an arbitrary number of robots. Furthermore, extensive illustrative simulations and fixed-wing aircraft outdoor experiments validate the effectiveness and robustness of our algorithm.