Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tobias Fischer

Event-LAB: Towards Standardized Evaluation of Neuromorphic Localization Methods

Sep 18, 2025

Adam D. Hines, Alejandro Fontan, Michael Milford, Tobias Fischer

Abstract:Event-based localization research and datasets are a rapidly growing area of interest, with a tenfold increase in the cumulative total number of published papers on this topic over the past 10 years. Whilst the rapid expansion in the field is exciting, it brings with it an associated challenge: a growth in the variety of required code and package dependencies as well as data formats, making comparisons difficult and cumbersome for researchers to implement reliably. To address this challenge, we present Event-LAB: a new and unified framework for running several event-based localization methodologies across multiple datasets. Event-LAB is implemented using the Pixi package and dependency manager, that enables a single command-line installation and invocation for combinations of localization methods and datasets. To demonstrate the capabilities of the framework, we implement two common event-based localization pipelines: Visual Place Recognition (VPR) and Simultaneous Localization and Mapping (SLAM). We demonstrate the ability of the framework to systematically visualize and analyze the results of multiple methods and datasets, revealing key insights such as the association of parameters that control event collection counts and window sizes for frame generation to large variations in performance. The results and analysis demonstrate the importance of fairly comparing methodologies with consistent event image generation parameters. Our Event-LAB framework provides this ability for the research community, by contributing a streamlined workflow for easily setting up multiple conditions.

* 8 pages, 6 figures, under review

Via

Access Paper or Ask Questions

Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

Aug 26, 2025

Melanie Wille, Tobias Fischer, Scarlett Raine

Figure 1 for Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

Figure 2 for Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

Figure 3 for Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

Figure 4 for Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

Abstract:Underwater object detection is critical for monitoring marine ecosystems but poses unique challenges, including degraded image quality, imbalanced class distribution, and distinct visual characteristics. Not every species is detected equally well, yet underlying causes remain unclear. We address two key research questions: 1) What factors beyond data quantity drive class-specific performance disparities? 2) How can we systematically improve detection of under-performing marine species? We manipulate the DUO dataset to separate the object detection task into localization and classification and investigate the under-performance of the scallop class. Localization analysis using YOLO11 and TIDE finds that foreground-background discrimination is the most problematic stage regardless of data quantity. Classification experiments reveal persistent precision gaps even with balanced data, indicating intrinsic feature-based challenges beyond data scarcity and inter-class dependencies. We recommend imbalanced distributions when prioritizing precision, and balanced distributions when prioritizing recall. Improving under-performing classes should focus on algorithmic advances, especially within localization modules. We publicly release our code and datasets.

* 10 pages

Via

Access Paper or Ask Questions

VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Apr 06, 2025

Alejandro Fontan, Tobias Fischer, Javier Civera, Michael Milford

Figure 1 for VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Figure 2 for VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Figure 3 for VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Figure 4 for VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

Abstract:Visual Simultaneous Localization and Mapping (VSLAM) research faces significant challenges due to fragmented toolchains, complex system configurations, and inconsistent evaluation methodologies. To address these issues, we present VSLAM-LAB, a unified framework designed to streamline the development, evaluation, and deployment of VSLAM systems. VSLAM-LAB simplifies the entire workflow by enabling seamless compilation and configuration of VSLAM algorithms, automated dataset downloading and preprocessing, and standardized experiment design, execution, and evaluation--all accessible through a single command-line interface. The framework supports a wide range of VSLAM systems and datasets, offering broad compatibility and extendability while promoting reproducibility through consistent evaluation metrics and analysis tools. By reducing implementation complexity and minimizing configuration overhead, VSLAM-LAB empowers researchers to focus on advancing VSLAM methodologies and accelerates progress toward scalable, real-world solutions. We demonstrate the ease with which user-relevant benchmarks can be created: here, we introduce difficulty-level-based categories, but one could envision environment-specific or condition-specific categories.

Via

Access Paper or Ask Questions

FlowR: Flowing from Sparse to Dense 3D Reconstructions

Apr 02, 2025

Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang, Nikhil Varma Keetha, Lorenzo Porzi, Norman Müller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder

Abstract:3D Gaussian splatting enables high-quality novel view synthesis (NVS) at real-time frame rates. However, its quality drops sharply as we depart from the training views. Thus, dense captures are needed to match the high-quality expectations of some applications, e.g. Virtual Reality (VR). However, such dense captures are very laborious and expensive to obtain. Existing works have explored using 2D generative models to alleviate this requirement by distillation or generating additional training views. These methods are often conditioned only on a handful of reference input views and thus do not fully exploit the available 3D information, leading to inconsistent generation results and reconstruction artifacts. To tackle this problem, we propose a multi-view, flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions. This enables augmenting scene captures with novel, generated views to improve reconstruction quality. Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass. Our pipeline consistently improves NVS in sparse- and dense-view scenarios, leading to higher-quality reconstructions than prior works across multiple, widely-used NVS benchmarks.

* Project page is available at https://tobiasfshr.github.io/pub/flowr

Via

Access Paper or Ask Questions

Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Mar 10, 2025

Somayeh Hussaini, Tobias Fischer, Michael Milford

Figure 1 for Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Figure 2 for Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Figure 3 for Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Figure 4 for Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Abstract:In visual place recognition (VPR), filtering and sequence-based matching approaches can improve performance by integrating temporal information across image sequences, especially in challenging conditions. While these methods are commonly applied, their effects on system behavior can be unpredictable and can actually make performance worse in certain situations. In this work, we present a new supervised learning approach that learns to predict the per-frame sequence matching receptiveness (SMR) of VPR techniques, enabling the system to selectively decide when to trust the output of a sequence matching system. The approach is agnostic to the underlying VPR technique. Our approach predicts SMR-and hence significantly improves VPR performance-across a large range of state-of-the-art and classical VPR techniques (namely CosPlace, MixVPR, EigenPlaces, SALAD, AP-GeM, NetVLAD and SAD), and across three benchmark VPR datasets (Nordland, Oxford RobotCar, and SFU-Mountain). We also provide insights into a complementary approach that uses the predictor to replace discarded matches, as well as ablation studies, including an analysis of the interactions between our SMR predictor and the selected sequence length. We will release our code upon acceptance.

* 8 pages, 5 figures, under review

Via

Access Paper or Ask Questions

Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Mar 06, 2025

Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan

Figure 1 for Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Figure 2 for Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Figure 3 for Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Figure 4 for Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Abstract:Effective monitoring of underwater ecosystems is crucial for tracking environmental changes, guiding conservation efforts, and ensuring long-term ecosystem health. However, automating underwater ecosystem management with robotic platforms remains challenging due to the complexities of underwater imagery, which pose significant difficulties for traditional visual localization methods. We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images. This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes. Furthermore, we introduce the SQUIDLE+ VPR Benchmark-the first large-scale underwater VPR benchmark designed to leverage an extensive collection of unstructured data from multiple robotic platforms, spanning time intervals from days to years. The dataset encompasses diverse trajectories, arbitrary overlap and diverse seafloor types captured under varying environmental conditions, including differences in depth, lighting, and turbidity. Our code is available at: https://github.com/bev-gorry/underloc

Via

Access Paper or Ask Questions

A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Dec 09, 2024

Connor Malone, Somayeh Hussaini, Tobias Fischer, Michael Milford

Figure 1 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 2 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 3 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Figure 4 for A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition

Abstract:Visual Place Recognition (VPR) enables coarse localization by comparing query images to a reference database of geo-tagged images. Recent breakthroughs in deep learning architectures and training regimes have led to methods with improved robustness to factors like environment appearance change, but with the downside that the required training and/or matching compute scales with the number of distinct environmental conditions encountered. Here, we propose Hyperdimensional One Place Signatures (HOPS) to simultaneously improve the performance, compute and scalability of these state-of-the-art approaches by fusing the descriptors from multiple reference sets captured under different conditions. HOPS scales to any number of environmental conditions by leveraging the Hyperdimensional Computing framework. Extensive evaluations demonstrate that our approach is highly generalizable and consistently improves recall performance across all evaluated VPR methods and datasets by large margins. Arbitrarily fusing reference images without compute penalty enables numerous other useful possibilities, three of which we demonstrate here: descriptor dimensionality reduction with no performance penalty, stacking synthetic images, and coarse localization to an entire traverse or environmental section.

* Under Review

Via

Access Paper or Ask Questions

Look Ma, No Ground Truth! Ground-Truth-Free Tuning of Structure from Motion and Visual SLAM

Dec 02, 2024

Alejandro Fontan, Javier Civera, Tobias Fischer, Michael Milford

Abstract:Evaluation is critical to both developing and tuning Structure from Motion (SfM) and Visual SLAM (VSLAM) systems, but is universally reliant on high-quality geometric ground truth -- a resource that is not only costly and time-intensive but, in many cases, entirely unobtainable. This dependency on ground truth restricts SfM and SLAM applications across diverse environments and limits scalability to real-world scenarios. In this work, we propose a novel ground-truth-free (GTF) evaluation methodology that eliminates the need for geometric ground truth, instead using sensitivity estimation via sampling from both original and noisy versions of input images. Our approach shows strong correlation with traditional ground-truth-based benchmarks and supports GTF hyperparameter tuning. Removing the need for ground truth opens up new opportunities to leverage a much larger number of dataset sources, and for self-supervised and online tuning, with the potential for a data-driven breakthrough analogous to what has occurred in generative AI.

Via

Access Paper or Ask Questions

Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications

Nov 18, 2024

Scarlett Raine, Frederic Maire, Niko Suenderhauf, Tobias Fischer

Abstract:Underwater surveys provide long-term data for informing management strategies, monitoring coral reef health, and estimating blue carbon stocks. Advances in broad-scale survey methods, such as robotic underwater vehicles, have increased the range of marine surveys but generate large volumes of imagery requiring analysis. Computer vision methods such as semantic segmentation aid automated image analysis, but typically rely on fully supervised training with extensive labelled data. While ground truth label masks for tasks like street scene segmentation can be quickly and affordably generated by non-experts through crowdsourcing services like Amazon Mechanical Turk, ecology presents greater challenges. The complexity of underwater images, coupled with the specialist expertise needed to accurately identify species at the pixel level, makes this process costly, time-consuming, and heavily dependent on domain experts. In recent years, some works have performed automated analysis of underwater imagery, and a smaller number of studies have focused on weakly supervised approaches which aim to reduce the expert-provided labelled data required. This survey focuses on approaches which reduce dependency on human expert input, while reviewing the prior and related approaches to position these works in the wider field of underwater perception. Further, we offer an overview of coastal ecosystems and the challenges of underwater imagery. We provide background on weakly and self-supervised deep learning and integrate these elements into a taxonomy that centres on the intersection of underwater monitoring, computer vision, and deep learning, while motivating approaches for weakly supervised deep learning with reduced dependency on domain expert data annotations. Lastly, the survey examines available datasets and platforms, and identifies gaps, barriers, and opportunities for automating underwater surveys.

* 70 pages, 20 figures

Via

Access Paper or Ask Questions

Exploring Emerging Trends and Research Opportunities in Visual Place Recognition

Nov 18, 2024

Antonios Gasteratos, Konstantinos A. Tsintotas, Tobias Fischer, Yiannis Aloimonos, Michael Milford

Figure 1 for Exploring Emerging Trends and Research Opportunities in Visual Place Recognition

Abstract:Visual-based recognition, e.g., image classification, object detection, etc., is a long-standing challenge in computer vision and robotics communities. Concerning the roboticists, since the knowledge of the environment is a prerequisite for complex navigation tasks, visual place recognition is vital for most localization implementations or re-localization and loop closure detection pipelines within simultaneous localization and mapping (SLAM). More specifically, it corresponds to the system's ability to identify and match a previously visited location using computer vision tools. Towards developing novel techniques with enhanced accuracy and robustness, while motivated by the success presented in natural language processing methods, researchers have recently turned their attention to vision-language models, which integrate visual and textual data.

* 2 pages, 1 figure. 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA@40), Rotterdam, Netherlands, September 23-26, 2024

Via

Access Paper or Ask Questions