Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric Marchand

Toward Robust Neural Reconstruction from Sparse Point Sets

Dec 20, 2024

Amine Ouasfi, Shubhendu Jena, Eric Marchand, Adnane Boukhayma

Abstract:We consider the challenging problem of learning Signed Distance Functions (SDF) from sparse and noisy 3D point clouds. In contrast to recent methods that depend on smoothness priors, our method, rooted in a distributionally robust optimization (DRO) framework, incorporates a regularization term that leverages samples from the uncertainty regions of the model to improve the learned SDFs. Thanks to tractable dual formulations, we show that this framework enables a stable and efficient optimization of SDFs in the absence of ground truth supervision. Using a variety of synthetic and real data evaluations from different modalities, we show that our DRO based learning framework can improve SDF learning with respect to baselines and the state-of-the-art methods.

* Project page : https://ouasfi.github.io/sdro/

Via

Access Paper or Ask Questions

JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields

Mar 27, 2023

Xi Wang, Robin Courant, Jinglei Shi, Eric Marchand, Marc Christie

Abstract:This paper presents JAWS, an optimization-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to a newly generated clip. To this end, we rely on an implicit-neural-representation (INR) in a way to compute a clip that shares the same cinematic features as the reference clip. We propose a general formulation of a camera optimization problem in an INR that computes extrinsic and intrinsic camera parameters as well as timing. By leveraging the differentiability of neural representations, we can back-propagate our designed cinematic losses measured on proxy estimators through a NeRF network to the proposed cinematic parameters directly. We also introduce specific enhancements such as guidance maps to improve the overall quality and efficiency. Results display the capacity of our system to replicate well known camera sequences from movies, adapting the framing, camera parameters and timing of the generated video clip to maximize the similarity with the reference clip.

* CVPR 2023. Project page with videos and code: http://www.lix.polytechnique.fr/vista/projects/2023_cvpr_wang

Via

Access Paper or Ask Questions

TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Sep 16, 2022

Mathieu Gonzalez, Eric Marchand, Amine Kacete, Jérôme Royan

Figure 1 for TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Figure 2 for TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Figure 3 for TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Figure 4 for TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Abstract:Most classical SLAM systems rely on the static scene assumption, which limits their applicability in real world scenarios. Recent SLAM frameworks have been proposed to simultaneously track the camera and moving objects. However they are often unable to estimate the canonical pose of the objects and exhibit a low object tracking accuracy. To solve this problem we propose TwistSLAM++, a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information. Using semantic information, we track potentially moving objects and associate them to 3D object detections in LiDAR scans to obtain their pose and size. Then, we perform registration on consecutive object scans to refine object pose estimation. Finally, object scans are used to estimate the shape of the object and constrain map points to lie on the estimated surface within the BA. We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.

Via

Access Paper or Ask Questions

TwistSLAM: Constrained SLAM in Dynamic Environment

Feb 24, 2022

Mathieu Gonzalez, Eric Marchand, Amine Kacete, Jérôme Royan

Figure 1 for TwistSLAM: Constrained SLAM in Dynamic Environment

Figure 2 for TwistSLAM: Constrained SLAM in Dynamic Environment

Figure 3 for TwistSLAM: Constrained SLAM in Dynamic Environment

Figure 4 for TwistSLAM: Constrained SLAM in Dynamic Environment

Abstract:Moving objects are present in most scenes of our life. However they can be very problematic for classical SLAM algorithms that assume the scene to be rigid. This assumption limits the applicability of those algorithms as they are unable to accurately estimate the camera pose and world structure in many scenarios. Some SLAM systems have been proposed to detect and mask out dynamic objects, making the static scene assumption valid. However this information can allow the system to track objects within the scene, while tracking the camera, which can be crucial for some applications. In this paper we present TwistSLAM a semantic, dynamic, stereo SLAM system that can track dynamic objects in the scene. Our algorithm creates clusters of points according to their semantic class. It uses the static parts of the environment to robustly localize the camera and tracks the remaining objects. We propose a new formulation for the tracking and the bundle adjustment to take in account the characteristics of mechanical joints between clusters to constrain and improve their pose estimation. We evaluate our approach on several sequences from a public dataset and show that we improve camera and object tracking compared to state of the art.

Via

Access Paper or Ask Questions

S3LAM: Structured Scene SLAM

Sep 15, 2021

Mathieu Gonzalez, Eric Marchand, Amine Kacete, Jérôme Royan

Figure 1 for S3LAM: Structured Scene SLAM

Figure 2 for S3LAM: Structured Scene SLAM

Figure 3 for S3LAM: Structured Scene SLAM

Figure 4 for S3LAM: Structured Scene SLAM

Abstract:We propose a new general SLAM system that uses the semantic segmentation of objects and structures in the scene. Semantic information is relevant as it contains high level information which may make SLAM more accurate and robust. Our contribution is threefold: i) A new SLAM system based on ORB-SLAM2 that creates a semantic map made of clusters of points corresponding to objects instances and structures in the scene. ii) A modification of the classical Bundle Adjustment formulation to constrain each cluster using geometrical priors, which improves both camera localization and reconstruction and enables a better understanding of the scene. iii) A new Bundle Adjustment formulation at the level of clusters to improve the convergence of classical Bundle Adjustment. We evaluate our approach on several sequences from a public dataset and show that, with respect to ORB-SLAM2 it improves camera pose estimation.

Via

Access Paper or Ask Questions

Tracking Pedestrian Heads in Dense Crowd

Mar 24, 2021

Ramana Sundararaman, Cedric De Almeida Braga, Eric Marchand, Julien Pettre

Figure 1 for Tracking Pedestrian Heads in Dense Crowd

Figure 2 for Tracking Pedestrian Heads in Dense Crowd

Figure 3 for Tracking Pedestrian Heads in Dense Crowd

Figure 4 for Tracking Pedestrian Heads in Dense Crowd

Abstract:Tracking humans in crowded video sequences is an important constituent of visual scene understanding. Increasing crowd density challenges visibility of humans, limiting the scalability of existing pedestrian trackers to higher crowd densities. For that reason, we propose to revitalize head tracking with Crowd of Heads Dataset (CroHD), consisting of 9 sequences of 11,463 frames with over 2,276,838 heads and 5,230 tracks annotated in diverse scenes. For evaluation, we proposed a new metric, IDEucl, to measure an algorithm's efficacy in preserving a unique identity for the longest stretch in image coordinate space, thus building a correspondence between pedestrian crowd motion and the performance of a tracking algorithm. Moreover, we also propose a new head detector, HeadHunter, which is designed for small head detection in crowded scenes. We extend HeadHunter with a Particle Filter and a color histogram based re-identification module for head tracking. To establish this as a strong baseline, we compare our tracker with existing state-of-the-art pedestrian trackers on CroHD and demonstrate superiority, especially in identity preserving tracking metrics. With a light-weight head detector and a tracker which is efficient at identity preservation, we believe our contributions will serve useful in advancement of pedestrian tracking in dense crowds.

Via

Access Paper or Ask Questions

YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation

Feb 25, 2020

Mathieu Gonzalez, Amine Kacete, Albert Murienne, Eric Marchand

Figure 1 for YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation

Figure 2 for YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation

Figure 3 for YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation

Figure 4 for YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation

Abstract:Estimating the 3D translation and orientation of an object is a challenging task that can be considered within augmented reality or robotic applications. In this paper, we propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image in cluttered scenes. We adopt an hybrid pipeline in two stages: data-driven and geometric respectively. The first data-driven step consists of a classification CNN to estimate the object 2D location in the image from local patches, followed by a regression CNN trained to predict the 3D location of a set of keypoints in the camera coordinate system. We robustly perform local voting to recover the location of each keypoint in the camera coordinate system. To extract the pose information, the geometric step consists in aligning the 3D points in the camera coordinate system with the corresponding 3D points in world coordinate system by minimizing a registration error, thus computing the pose. Our experiments on the standard dataset LineMod show that our approach more robust and accurate than state-of-the-art methods.

Via

Access Paper or Ask Questions

Visual Servoing from Deep Neural Networks

Jun 07, 2017

Quentin Bateux, Eric Marchand, Jürgen Leitner, Francois Chaumette, Peter Corke

Figure 1 for Visual Servoing from Deep Neural Networks

Figure 2 for Visual Servoing from Deep Neural Networks

Figure 3 for Visual Servoing from Deep Neural Networks

Figure 4 for Visual Servoing from Deep Neural Networks

Abstract:We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same scene. The output of the network is then employed in a visual servoing control scheme. The method converges robustly even in difficult real-world settings with strong lighting variations and occlusions.A positioning error of less than one millimeter is obtained in experiments with a 6 DOF robot.

* fixed authors list

Via

Access Paper or Ask Questions