Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomáš Pivoňka

Multi-Platform Teach-and-Repeat Navigation by Visual Place Recognition Based on Deep-Learned Local Features

Mar 17, 2025

Václav Truhlařík, Tomáš Pivoňka, Michal Kasarda, Libor Přeučil

Abstract:Uniform and variable environments still remain a challenge for stable visual localization and mapping in mobile robot navigation. One of the possible approaches suitable for such environments is appearance-based teach-and-repeat navigation, relying on simplified localization and reactive robot motion control - all without a need for standard mapping. This work brings an innovative solution to such a system based on visual place recognition techniques. Here, the major contributions stand in the employment of a new visual place recognition technique, a novel horizontal shift computation approach, and a multi-platform system design for applications across various types of mobile robots. Secondly, a new public dataset for experimental testing of appearance-based navigation methods is introduced. Moreover, the work also provides real-world experimental testing and performance comparison of the introduced navigation system against other state-of-the-art methods. The results confirm that the new system outperforms existing methods in several testing scenarios, is capable of operation indoors and outdoors, and exhibits robustness to day and night scene variations.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions

On Model-Free Re-ranking for Visual Place Recognition with Deep Learned Local Features

Oct 24, 2024

Tomáš Pivoňka, Libor Přeučil

Abstract:Re-ranking is the second stage of a visual place recognition task, in which the system chooses the best-matching images from a pre-selected subset of candidates. Model-free approaches compute the image pair similarity based on a spatial comparison of corresponding local visual features, eliminating the need for computationally expensive estimation of a model describing transformation between images. The article focuses on model-free re-ranking based on standard local visual features and their applicability in long-term autonomy systems. It introduces three new model-free re-ranking methods that were designed primarily for deep-learned local visual features. These features evince high robustness to various appearance changes, which stands as a crucial property for use with long-term autonomy systems. All the introduced methods were employed in a new visual place recognition system together with the D2-net feature detector (Dusmanu, 2019) and experimentally tested with diverse, challenging public datasets. The obtained results are on par with current state-of-the-art methods, affirming that model-free approaches are a viable and worthwhile path for long-term visual place recognition.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving

Apr 27, 2024

Junyi Gu, Mauro Bellone, Tomáš Pivoňka, Raivo Sell

Abstract:Critical research about camera-and-LiDAR-based semantic object segmentation for autonomous driving significantly benefited from the recent development of deep learning. Specifically, the vision transformer is the novel ground-breaker that successfully brought the multi-head-attention mechanism to computer vision applications. Therefore, we propose a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. Our proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. Unlike other works in the literature, our camera-LiDAR fusion transformers have been evaluated in challenging conditions like rain and low illumination, showing robust performance. The paper reports the segmentation results over the vehicle and human classes in different modalities: camera-only, LiDAR-only, and camera-LiDAR fusion. We perform coherent controlled benchmark experiments of CLFT against other networks that are also designed for semantic segmentation. The experiments aim to evaluate the performance of CLFT independently from two perspectives: multimodal sensor fusion and backbone architectures. The quantitative assessments show our CLFT networks yield an improvement of up to 10\% for challenging dark-wet conditions when comparing with Fully-Convolutional-Neural-Network-based (FCN) camera-LiDAR fusion neural network. Contrasting to the network with transformer backbone but using single modality input, the all-around improvement is 5-10\%.

* Submitted to IEEE Transactions on Intelligent Vehicles

Via

Access Paper or Ask Questions