Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edgar Sucar

Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction

Mar 20, 2025

Edgar Sucar, Zihang Lai, Eldar Insafutdinov, Andrea Vedaldi

Abstract:DUSt3R has recently shown that one can reduce many tasks in multi-view geometry, including estimating camera intrinsics and extrinsics, reconstructing the scene in 3D, and establishing image correspondences, to the prediction of a pair of viewpoint-invariant point maps, i.e., pixel-aligned point clouds defined in a common reference frame. This formulation is elegant and powerful, but unable to tackle dynamic scenes. To address this challenge, we introduce the concept of Dynamic Point Maps (DPM), extending standard point maps to support 4D tasks such as motion segmentation, scene flow estimation, 3D object tracking, and 2D correspondence. Our key intuition is that, when time is introduced, there are several possible spatial and time references that can be used to define the point maps. We identify a minimal subset of such combinations that can be regressed by a network to solve the sub tasks mentioned above. We train a DPM predictor on a mixture of synthetic and real data and evaluate it across diverse benchmarks for video depth prediction, dynamic point cloud reconstruction, 3D scene flow and object pose tracking, achieving state-of-the-art performance. Code, models and additional results are available at https://www.robots.ox.ac.uk/~vgg/research/dynamic-point-maps/.

* Web page: https://www.robots.ox.ac.uk/~vgg/research/dynamic-point-maps/

Via

Access Paper or Ask Questions

Real-time Mapping of Physical Scene Properties with an Autonomous Robot Experimenter

Oct 31, 2022

Iain Haughton, Edgar Sucar, Andre Mouton, Edward Johns, Andrew J. Davison

Abstract:Neural fields can be trained from scratch to represent the shape and appearance of 3D scenes efficiently. It has also been shown that they can densely map correlated properties such as semantics, via sparse interactions from a human labeller. In this work, we show that a robot can densely annotate a scene with arbitrary discrete or continuous physical properties via its own fully-autonomous experimental interactions, as it simultaneously scans and maps it with an RGB-D camera. A variety of scene interactions are possible, including poking with force sensing to determine rigidity, measuring local material type with single-pixel spectroscopy or predicting force distributions by pushing. Sparse experimental interactions are guided by entropy to enable high efficiency, with tabletop scene properties densely mapped from scratch in a few minutes from a few tens of interactions.

Via

Access Paper or Ask Questions

Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

Oct 06, 2022

Kirill Mazur, Edgar Sucar, Andrew J. Davison

Figure 1 for Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

Figure 2 for Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

Figure 3 for Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

Figure 4 for Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

Abstract:General scene understanding for robotics requires flexible semantic representation, so that novel objects and structures which may not have been known at training time can be identified, segmented and grouped. We present an algorithm which fuses general learned features from a standard pre-trained network into a highly efficient 3D geometric neural field representation during real-time SLAM. The fused 3D feature maps inherit the coherence of the neural field's geometry representation. This means that tiny amounts of human labelling interacting at runtime enable objects or even parts of objects to be robustly and accurately segmented in an open set manner.

* For our project page, see https://makezur.github.io/FeatureRealisticFusion/

Via

Access Paper or Ask Questions

iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

Apr 05, 2022

Joseph Ortiz, Alexander Clegg, Jing Dong, Edgar Sucar, David Novotny, Michael Zollhoefer, Mustafa Mukadam

Figure 1 for iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

Figure 2 for iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

Figure 3 for iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

Figure 4 for iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

Abstract:We present iSDF, a continual learning system for real-time signed distance field (SDF) reconstruction. Given a stream of posed depth images from a moving camera, it trains a randomly initialised neural network to map input 3D coordinate to approximate signed distance. The model is self-supervised by minimising a loss that bounds the predicted signed distance using the distance to the closest sampled point in a batch of query points that are actively sampled. In contrast to prior work based on voxel grids, our neural method is able to provide adaptive levels of detail with plausible filling in of partially observed regions and denoising of observations, all while having a more compact representation. In evaluations against alternative methods on real and synthetic datasets of indoor environments, we find that iSDF produces more accurate reconstructions, and better approximations of collision costs and gradients useful for downstream planners in domains from navigation to manipulation. Code and video results can be found at our project page: https://joeaortiz.github.io/iSDF/ .

* Project page: https://joeaortiz.github.io/iSDF/

Via

Access Paper or Ask Questions

ILabel: Interactive Neural Scene Labelling

Dec 03, 2021

Shuaifeng Zhi, Edgar Sucar, Andre Mouton, Iain Haughton, Tristan Laidlow, Andrew J. Davison

Figure 1 for ILabel: Interactive Neural Scene Labelling

Figure 2 for ILabel: Interactive Neural Scene Labelling

Figure 3 for ILabel: Interactive Neural Scene Labelling

Figure 4 for ILabel: Interactive Neural Scene Labelling

Abstract:Joint representation of geometry, colour and semantics using a 3D neural field enables accurate dense labelling from ultra-sparse interactions as a user reconstructs a scene in real-time using a handheld RGB-D sensor. Our iLabel system requires no training data, yet can densely label scenes more accurately than standard methods trained on large, expensively labelled image datasets. Furthermore, it works in an 'open set' manner, with semantic classes defined on the fly by the user. ILabel's underlying model is a multilayer perceptron (MLP) trained from scratch in real-time to learn a joint neural scene representation. The scene model is updated and visualised in real-time, allowing the user to focus interactions to achieve efficient labelling. A room or similar scene can be accurately labelled into 10+ semantic categories with only a few tens of clicks. Quantitative labelling accuracy scales powerfully with the number of clicks, and rapidly surpasses standard pre-trained semantic segmentation methods. We also demonstrate a hierarchical labelling variant.

* Project page: https://edgarsucar.github.io/ilabel/ Video: https://youtu.be/bL7RZaMhRbk

Via

Access Paper or Ask Questions

Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Sep 13, 2021

Joseph Ortiz, Talfan Evans, Edgar Sucar, Andrew J. Davison

Figure 1 for Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Figure 2 for Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Figure 3 for Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Figure 4 for Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Abstract:Scene graphs represent the key components of a scene in a compact and semantically rich way, but are difficult to build during incremental SLAM operation because of the challenges of robustly identifying abstract scene elements and optimising continually changing, complex graphs. We present a distributed, graph-based SLAM framework for incrementally building scene graphs based on two novel components. First, we propose an incremental abstraction framework in which a neural network proposes abstract scene elements that are incorporated into the factor graph of a feature-based monocular SLAM system. Scene elements are confirmed or rejected through optimisation and incrementally replace the points yielding a more dense, semantic and compact representation. Second, enabled by our novel routing procedure, we use Gaussian Belief Propagation (GBP) for distributed inference on a graph processor. The time per iteration of GBP is structure-agnostic and we demonstrate the speed advantages over direct methods for inference of heterogeneous factor graphs. We run our system on real indoor datasets using planar abstractions and recover the major planes with significant compression.

* 8 pages. Project page: https://joeaortiz.github.io/incremental_abstraction/

Via

Access Paper or Ask Questions

iMAP: Implicit Mapping and Positioning in Real-Time

Mar 23, 2021

Edgar Sucar, Shikun Liu, Joseph Ortiz, Andrew J. Davison

Figure 1 for iMAP: Implicit Mapping and Positioning in Real-Time

Figure 2 for iMAP: Implicit Mapping and Positioning in Real-Time

Figure 3 for iMAP: Implicit Mapping and Positioning in Real-Time

Figure 4 for iMAP: Implicit Mapping and Positioning in Real-Time

Abstract:We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a real-time SLAM system for a handheld RGB-D camera. Our network is trained in live operation without prior data, building a dense, scene-specific implicit 3D model of occupancy and colour which is also immediately used for tracking. Achieving real-time SLAM via continual training of a neural network against a live image stream requires significant innovation. Our iMAP algorithm uses a keyframe structure and multi-processing computation flow, with dynamic information-guided pixel sampling for speed, with tracking at 10 Hz and global map updating at 2 Hz. The advantages of an implicit MLP over standard dense SLAM techniques include efficient geometry representation with automatic detail control and smooth, plausible filling-in of unobserved regions such as the back surfaces of objects.

Via

Access Paper or Ask Questions

Neural Object Descriptors for Multi-View Shape Reconstruction

Apr 09, 2020

Edgar Sucar, Kentaro Wada, Andrew Davison

Figure 1 for Neural Object Descriptors for Multi-View Shape Reconstruction

Figure 2 for Neural Object Descriptors for Multi-View Shape Reconstruction

Figure 3 for Neural Object Descriptors for Multi-View Shape Reconstruction

Figure 4 for Neural Object Descriptors for Multi-View Shape Reconstruction

Abstract:The choice of scene representation is crucial in both the shape inference algorithms it requires and the smart applications it enables. We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine, for principled full object shape inference from one or more RGB-D images. Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system capable of optimising object poses and shapes jointly with camera trajectory.

Via

Access Paper or Ask Questions

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Apr 09, 2020

Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, Andrew J. Davison

Figure 1 for MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Figure 2 for MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Figure 3 for MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Figure 4 for MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Abstract:Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.

* 10 pages, 10 figures, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Via

Access Paper or Ask Questions

Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift

Nov 07, 2017

Edgar Sucar, Jean-Bernard Hayet

Figure 1 for Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift

Figure 2 for Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift

Figure 3 for Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift

Figure 4 for Bayesian Scale Estimation for Monocular SLAM Based on Generic Object Detection for Correcting Scale Drift

Abstract:This work proposes a new, online algorithm for estimating the local scale correction to apply to the output of a monocular SLAM system and obtain an as faithful as possible metric reconstruction of the 3D map and of the camera trajectory. Within a Bayesian framework, it integrates observations from a deep-learning based generic object detector and a prior on the evolution of the scale drift. For each observation class, a predefined prior on the heights of the class objects is used. This allows to define the observations likelihood. Due to the scale drift inherent to monocular SLAM systems, we integrate a rough model on the dynamics of scale drift. Quantitative evaluations of the system are presented on the KITTI dataset, and compared with different approaches. The results show a superior performance of our proposal in terms of relative translational error when compared to other monocular systems.

Via

Access Paper or Ask Questions