Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Milford

TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

May 22, 2025

Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

Abstract:TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.

Via

Access Paper or Ask Questions

Long Exposure Localization in Darkness Using Consumer Cameras

Apr 23, 2025

Michael Milford, Ian Turner, Peter Corke

Abstract:In this paper we evaluate performance of the SeqSLAM algorithm for passive vision-based localization in very dark environments with low-cost cameras that result in massively blurred images. We evaluate the effect of motion blur from exposure times up to 10,000 ms from a moving car, and the performance of localization in day time from routes learned at night in two different environments. Finally we perform a statistical analysis that compares the baseline performance of matching unprocessed grayscale images to using patch normalization and local neighborhood normalization - the two key SeqSLAM components. Our results and analysis show for the first time why the SeqSLAM algorithm is effective, and demonstrate the potential for cheap camera-based localization systems that function despite extreme appearance change.

* 2013 IEEE International Conference on Robotics and Automation

Via

Access Paper or Ask Questions

VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Apr 06, 2025

Alejandro Fontan, Tobias Fischer, Javier Civera, Michael Milford

Abstract:Visual Simultaneous Localization and Mapping (VSLAM) research faces significant challenges due to fragmented toolchains, complex system configurations, and inconsistent evaluation methodologies. To address these issues, we present VSLAM-LAB, a unified framework designed to streamline the development, evaluation, and deployment of VSLAM systems. VSLAM-LAB simplifies the entire workflow by enabling seamless compilation and configuration of VSLAM algorithms, automated dataset downloading and preprocessing, and standardized experiment design, execution, and evaluation--all accessible through a single command-line interface. The framework supports a wide range of VSLAM systems and datasets, offering broad compatibility and extendability while promoting reproducibility through consistent evaluation metrics and analysis tools. By reducing implementation complexity and minimizing configuration overhead, VSLAM-LAB empowers researchers to focus on advancing VSLAM methodologies and accelerates progress toward scalable, real-world solutions. We demonstrate the ease with which user-relevant benchmarks can be created: here, we introduce difficulty-level-based categories, but one could envision environment-specific or condition-specific categories.

Via

Access Paper or Ask Questions

Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Mar 10, 2025

Somayeh Hussaini, Tobias Fischer, Michael Milford

Abstract:In visual place recognition (VPR), filtering and sequence-based matching approaches can improve performance by integrating temporal information across image sequences, especially in challenging conditions. While these methods are commonly applied, their effects on system behavior can be unpredictable and can actually make performance worse in certain situations. In this work, we present a new supervised learning approach that learns to predict the per-frame sequence matching receptiveness (SMR) of VPR techniques, enabling the system to selectively decide when to trust the output of a sequence matching system. The approach is agnostic to the underlying VPR technique. Our approach predicts SMR-and hence significantly improves VPR performance-across a large range of state-of-the-art and classical VPR techniques (namely CosPlace, MixVPR, EigenPlaces, SALAD, AP-GeM, NetVLAD and SAD), and across three benchmark VPR datasets (Nordland, Oxford RobotCar, and SFU-Mountain). We also provide insights into a complementary approach that uses the predictor to replace discarded matches, as well as ablation studies, including an analysis of the interactions between our SMR predictor and the selected sequence length. We will release our code upon acceptance.

* 8 pages, 5 figures, under review

Via

Access Paper or Ask Questions

Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Mar 06, 2025

Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan

Abstract:Effective monitoring of underwater ecosystems is crucial for tracking environmental changes, guiding conservation efforts, and ensuring long-term ecosystem health. However, automating underwater ecosystem management with robotic platforms remains challenging due to the complexities of underwater imagery, which pose significant difficulties for traditional visual localization methods. We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images. This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes. Furthermore, we introduce the SQUIDLE+ VPR Benchmark-the first large-scale underwater VPR benchmark designed to leverage an extensive collection of unstructured data from multiple robotic platforms, spanning time intervals from days to years. The dataset encompasses diverse trajectories, arbitrary overlap and diverse seafloor types captured under varying environmental conditions, including differences in depth, lighting, and turbidity. Our code is available at: https://github.com/bev-gorry/underloc

Via

Access Paper or Ask Questions

TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

Mar 04, 2025

Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

Figure 1 for TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

Figure 2 for TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

Figure 3 for TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

Figure 4 for TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

Abstract:Visual Place Recognition (VPR) localizes a query image by matching it against a database of geo-tagged reference images, making it essential for navigation and mapping in robotics. Although Vision Transformer (ViT) solutions deliver high accuracy, their large models often exceed the memory and compute budgets of resource-constrained platforms such as drones and mobile robots. To address this issue, we propose TeTRA, a ternary transformer approach that progressively quantizes the ViT backbone to 2-bit precision and binarizes its final embedding layer, offering substantial reductions in model size and latency. A carefully designed progressive distillation strategy preserves the representational power of a full-precision teacher, allowing TeTRA to retain or even surpass the accuracy of uncompressed convolutional counterparts, despite using fewer resources. Experiments on standard VPR benchmarks demonstrate that TeTRA reduces memory consumption by up to 69% compared to efficient baselines, while lowering inference latency by 35%, with either no loss or a slight improvement in recall@1. These gains enable high-accuracy VPR on power-constrained, memory-limited robotic platforms, making TeTRA an appealing solution for real-world deployment.

Via

Access Paper or Ask Questions

Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Jan 28, 2025

Sania Waheed, Bruno Ferrarini, Michael Milford, Sarvapali D. Ramchurn, Shoaib Ehsan

Figure 1 for Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Figure 2 for Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Figure 3 for Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Figure 4 for Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Abstract:The advances in Vision-Language models (VLMs) offer exciting opportunities for robotic applications involving image geo-localization, the problem of identifying the geo-coordinates of a place based on visual data only. Recent research works have focused on using a VLM as embeddings extractor for geo-localization, however, the most sophisticated VLMs may only be available as black boxes that are accessible through an API, and come with a number of limitations: there is no access to training data, model features and gradients; retraining is not possible; the number of predictions may be limited by the API; training on model outputs is often prohibited; and queries are open-ended. The utilization of a VLM as a stand-alone, zero-shot geo-localization system using a single text-based prompt is largely unexplored. To bridge this gap, this paper undertakes the first systematic study, to the best of our knowledge, to investigate the potential of some of the state-of-the-art VLMs as stand-alone, zero-shot geo-localization systems in a black-box setting with realistic constraints. We consider three main scenarios for this thorough investigation: a) fixed text-based prompt; b) semantically-equivalent text-based prompts; and c) semantically-equivalent query images. We also take into account the auto-regressive and probabilistic generation process of the VLMs when investigating their utility for geo-localization task by using model consistency as a metric in addition to traditional accuracy. Our work provides new insights in the capabilities of different VLMs for the above-mentioned scenarios.

* Submitted to IROS 2025

Via

Access Paper or Ask Questions

On Motion Blur and Deblurring in Visual Place Recognition

Dec 10, 2024

Timur Ismagilov, Bruno Ferrarini, Michael Milford, Tan Viet Tuyen Nguyen, SD Ramchurn, Shoaib Ehsan

Figure 1 for On Motion Blur and Deblurring in Visual Place Recognition

Figure 2 for On Motion Blur and Deblurring in Visual Place Recognition

Figure 3 for On Motion Blur and Deblurring in Visual Place Recognition

Figure 4 for On Motion Blur and Deblurring in Visual Place Recognition

Abstract:Visual Place Recognition (VPR) in mobile robotics enables robots to localize themselves by recognizing previously visited locations using visual data. While the reliability of VPR methods has been extensively studied under conditions such as changes in illumination, season, weather and viewpoint, the impact of motion blur is relatively unexplored despite its relevance not only in rapid motion scenarios but also in low-light conditions where longer exposure times are necessary. Similarly, the role of image deblurring in enhancing VPR performance under motion blur has received limited attention so far. This paper bridges these gaps by introducing a new benchmark designed to evaluate VPR performance under the influence of motion blur and image deblurring. The benchmark includes three datasets that encompass a wide range of motion blur intensities, providing a comprehensive platform for analysis. Experimental results with several well-established VPR and image deblurring methods provide new insights into the effects of motion blur and the potential improvements achieved through deblurring. Building on these findings, the paper proposes adaptive deblurring strategies for VPR, designed to effectively manage motion blur in dynamic, real-world scenarios.

Via

Access Paper or Ask Questions

A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Dec 09, 2024

Connor Malone, Somayeh Hussaini, Tobias Fischer, Michael Milford

Figure 1 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 2 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 3 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 4 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Abstract:Visual Place Recognition (VPR) enables coarse localization by comparing query images to a reference database of geo-tagged images. Recent breakthroughs in deep learning architectures and training regimes have led to methods with improved robustness to factors like environment appearance change, but with the downside that the required training and/or matching compute scales with the number of distinct environmental conditions encountered. Here, we propose Hyperdimensional One Place Signatures (HOPS) to simultaneously improve the performance, compute and scalability of these state-of-the-art approaches by fusing the descriptors from multiple reference sets captured under different conditions. HOPS scales to any number of environmental conditions by leveraging the Hyperdimensional Computing framework. Extensive evaluations demonstrate that our approach is highly generalizable and consistently improves recall performance across all evaluated VPR methods and datasets by large margins. Arbitrarily fusing reference images without compute penalty enables numerous other useful possibilities, three of which we demonstrate here: descriptor dimensionality reduction with no performance penalty, stacking synthetic images, and coarse localization to an entire traverse or environmental section.

* Under Review

Via

Access Paper or Ask Questions

Look Ma, No Ground Truth! Ground-Truth-Free Tuning of Structure from Motion and Visual SLAM

Dec 02, 2024

Alejandro Fontan, Javier Civera, Tobias Fischer, Michael Milford

Abstract:Evaluation is critical to both developing and tuning Structure from Motion (SfM) and Visual SLAM (VSLAM) systems, but is universally reliant on high-quality geometric ground truth -- a resource that is not only costly and time-intensive but, in many cases, entirely unobtainable. This dependency on ground truth restricts SfM and SLAM applications across diverse environments and limits scalability to real-world scenarios. In this work, we propose a novel ground-truth-free (GTF) evaluation methodology that eliminates the need for geometric ground truth, instead using sensitivity estimation via sampling from both original and noisy versions of input images. Our approach shows strong correlation with traditional ground-truth-based benchmarks and supports GTF hyperparameter tuning. Removing the need for ground truth opens up new opportunities to leverage a much larger number of dataset sources, and for self-supervised and online tuning, with the potential for a data-driven breakthrough analogous to what has occurred in generative AI.

Via

Access Paper or Ask Questions