Abstract:Effective robot navigation in dynamic environments is a challenging task that depends on generating precise control actions at high frequencies. Recent advancements have framed navigation as a goal-conditioned control problem. Current state-of-the-art methods for goal-based navigation, such as diffusion policies, either generate sub-goal images or robot control actions to guide robots. However, despite their high accuracy, these methods incur substantial computational costs, which limits their practicality for real-time applications. Recently, Conditional Flow Matching(CFM) has emerged as a more efficient and robust generalization of diffusion. In this work we explore the use of CFM to learn action policies that help the robot navigate its environment. Our results demonstrate that CFM is able to generate highly accurate robot actions. CFM not only matches the accuracy of diffusion policies but also significantly improves runtime performance. This makes it particularly advantageous for real-time robot navigation, where swift, reliable action generation is vital for collision avoidance and smooth operation. By leveraging CFM, we provide a pathway to more scalable, responsive robot navigation systems capable of handling the demands of dynamic and unpredictable environments.
Abstract:Learning-based manipulation policies from image inputs often show weak task transfer capabilities. In contrast, visual servoing methods allow efficient task transfer in high-precision scenarios while requiring only a few demonstrations. In this work, we present a framework that formulates the visual servoing task as graph traversal. Our method not only extends the robustness of visual servoing, but also enables multitask capability based on a few task-specific demonstrations. We construct demonstration graphs by splitting existing demonstrations and recombining them. In order to traverse the demonstration graph in the inference case, we utilize a similarity function that helps select the best demonstration for a specific task. This enables us to compute the shortest path through the graph. Ultimately, we show that recombining demonstrations leads to higher task-respective success. We present extensive simulation and real-world experimental results that demonstrate the efficacy of our approach.
Abstract:Localization is paramount for autonomous robots. While camera and LiDAR-based approaches have been extensively investigated, they are affected by adverse illumination and weather conditions. Therefore, radar sensors have recently gained attention due to their intrinsic robustness to such conditions. In this paper, we propose RaLF, a novel deep neural network-based approach for localizing radar scans in a LiDAR map of the environment, by jointly learning to address both place recognition and metric localization. RaLF is composed of radar and LiDAR feature encoders, a place recognition head that generates global descriptors, and a metric localization head that predicts the 3-DoF transformation between the radar scan and the map. We tackle the place recognition task by learning a shared embedding space between the two modalities via cross-modal metric learning. Additionally, we perform metric localization by predicting pixel-level flow vectors that align the query radar scan with the LiDAR map. We extensively evaluate our approach on multiple real-world driving datasets and show that RaLF achieves state-of-the-art performance for both place recognition and metric localization. Moreover, we demonstrate that our approach can effectively generalize to different cities and sensor setups than the ones used during training. We make the code and trained models publicly available at http://ralf.cs.uni-freiburg.de.