Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

María Leyva-Vallina

Regressing Transformers for Data-efficient Visual Place Recognition

Jan 29, 2024

María Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov

Abstract:Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.

* Accepted for publication in ICRA 2024

Via

Access Paper or Ask Questions

Generalized Contrastive Optimization of Siamese Networks for Place Recognition

Mar 11, 2021

María Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov

Figure 1 for Generalized Contrastive Optimization of Siamese Networks for Place Recognition

Figure 2 for Generalized Contrastive Optimization of Siamese Networks for Place Recognition

Figure 3 for Generalized Contrastive Optimization of Siamese Networks for Place Recognition

Figure 4 for Generalized Contrastive Optimization of Siamese Networks for Place Recognition

Abstract:Visual place recognition is a challenging task in computer vision and a key component of camera-based localization and navigation systems. Recently, Convolutional Neural Networks (CNNs) achieved high results and good generalization capabilities. They are usually trained using pairs or triplets of images labeled as either similar or dissimilar, in a binary fashion. In practice, the similarity between two images is not binary, but rather continuous. Furthermore, training these CNNs is computationally complex and involves costly pair and triplet mining strategies. We propose a Generalized Contrastive loss (GCL) function that relies on image similarity as a continuous measure, and use it to train a siamese CNN. Furthermore, we propose three techniques for automatic annotation of image pairs with labels indicating their degree of similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes datasets. We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts. Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, and generalize well on the Pittsburgh, TokyoTM and Tokyo 24/7 datasets. Furthermore, training a siamese network using the GCL function does not require complex pair mining. We release the source code at https://github.com/marialeyvallina/generalized_contrastive_loss.

* Under review

Via

Access Paper or Ask Questions