Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianeng Wang

Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Jun 04, 2025

Zirui Wang, Wenjing Bian, Xinghui Li, Yifu Tao, Jianeng Wang, Maurice Fallon, Victor Adrian Prisacariu

Abstract:We introduce Oxford Day-and-Night, a large-scale, egocentric dataset for novel view synthesis (NVS) and visual relocalisation under challenging lighting conditions. Existing datasets often lack crucial combinations of features such as ground-truth 3D geometry, wide-ranging lighting variation, and full 6DoF motion. Oxford Day-and-Night addresses these gaps by leveraging Meta ARIA glasses to capture egocentric video and applying multi-session SLAM to estimate camera poses, reconstruct 3D point clouds, and align sequences captured under varying lighting conditions, including both day and night. The dataset spans over 30 $\mathrm{km}$ of recorded trajectories and covers an area of 40,000 $\mathrm{m}^2$, offering a rich foundation for egocentric 3D vision research. It supports two core benchmarks, NVS and relocalisation, providing a unique platform for evaluating models in realistic and diverse environments.

* Project page: https://oxdan.active.vision/

Via

Access Paper or Ask Questions

Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

Mar 21, 2024

Jianeng Wang, Matias Mattamala, Christina Kassab, Lintong Zhang, Maurice Fallon

Figure 1 for Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

Figure 2 for Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

Figure 3 for Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

Figure 4 for Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

Abstract:Exoskeletons for daily use by those with mobility impairments are being developed. They will require accurate and robust scene understanding systems. Current research has used vision to identify immediate terrain and geometric obstacles, however these approaches are constrained to detections directly in front of the user and are limited to classifying a finite range of terrain types (e.g., stairs, ramps and level-ground). This paper presents Exosense, a vision-centric scene understanding system which is capable of generating rich, globally-consistent elevation maps, incorporating both semantic and terrain traversability information. It features an elastic Atlas mapping framework associated with a visual SLAM pose graph, embedded with open-vocabulary room labels from a Vision-Language Model (VLM). The device's design includes a wide field-of-view (FoV) fisheye multi-camera system to mitigate the challenges introduced by the exoskeleton walking pattern. We demonstrate the system's robustness to the challenges of typical periodic walking gaits, and its ability to construct accurate semantically-rich maps in indoor settings. Additionally, we showcase its potential for motion planning -- providing a step towards safe navigation for exoskeletons.

* 8 pages, 10 figures

Via

Access Paper or Ask Questions

Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression

Jun 01, 2023

Jianeng Wang, Jonathan D. Gammell

Figure 1 for Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression

Figure 2 for Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression

Figure 3 for Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression

Figure 4 for Event-based Visual Odometry with Full Temporal Resolution via Continuous-time Gaussian Process Regression

Abstract:Event-based cameras asynchronously capture individual visual changes in a scene. This makes them more robust than traditional frame-based cameras to highly dynamic motions and poor illumination. It also means that every measurement in a scene can occur at a unique time. Handling these different measurement times is a major challenge of using event-based cameras. It is often addressed in visual odometry (VO) pipelines by approximating temporally close measurements as occurring at one common time. This grouping simplifies the estimation problem but sacrifices the inherent temporal resolution of event-based cameras. This paper instead presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation. It uses continuous-time trajectory estimation to maintain the temporal fidelity and asynchronous nature of event-based cameras through Gaussian process regression with a physically motivated prior. Its performance is evaluated on the MVSEC dataset, where it achieves 7.9e-3 and 5.9e-3 RMS relative error on two independent sequences, outperforming the existing publicly available event-based stereo VO pipeline by two and four times, respectively.

* Submitted to IEEE Robotics and Automation Letters (RA-L). Manuscript #23-1314. 8 pages, 4 figures

Via

Access Paper or Ask Questions