Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Filliat

U2IS

Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation

Dec 18, 2024

Rémi Marsal, Alexandre Chapoutot, Philippe Xu, David Filliat

Figure 1 for Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation

Figure 2 for Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation

Figure 3 for Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation

Figure 4 for Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation

Abstract:The recent development of foundation models for monocular depth estimation such as Depth Anything paved the way to zero-shot monocular depth estimation. Since it returns an affine-invariant disparity map, the favored technique to recover the metric depth consists in fine-tuning the model. However, this stage is costly to perform because of the training but also due to the creation of the dataset. It must contain images captured by the camera that will be used at test time and the corresponding ground truth. Moreover, the fine-tuning may also degrade the generalizing capacity of the original model. Instead, we propose in this paper a new method to rescale Depth Anything predictions using 3D points provided by low-cost sensors or techniques such as low-resolution LiDAR, stereo camera, structure-from-motion where poses are given by an IMU. Thus, this approach avoids fine-tuning and preserves the generalizing power of the original depth estimation model while being robust to the noise of the sensor or of the depth model. Our experiments highlight improvements relative to other metric depth estimation methods and competitive results compared to fine-tuned approaches. Code available at https://gitlab.ensta.fr/ssh/monocular-depth-rescaling.

Via

Access Paper or Ask Questions

Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting

Mar 26, 2024

Adrien Lafage, Mathieu Barbier, Gianni Franchi, David Filliat

Abstract:Accurate trajectory forecasting is crucial for the performance of various systems, such as advanced driver-assistance systems and self-driving vehicles. These forecasts allow to anticipate events leading to collisions and, therefore, to mitigate them. Deep Neural Networks have excelled in motion forecasting, but issues like overconfidence and uncertainty quantification persist. Deep Ensembles address these concerns, yet applying them to multimodal distributions remains challenging. In this paper, we propose a novel approach named Hierarchical Light Transformer Ensembles (HLT-Ens), aimed at efficiently training an ensemble of Transformer architectures using a novel hierarchical loss function. HLT-Ens leverages grouped fully connected layers, inspired by grouped convolution techniques, to capture multimodal distributions, effectively. Through extensive experimentation, we demonstrate that HLT-Ens achieves state-of-the-art performance levels, offering a promising avenue for improving trajectory forecasting techniques.

Via

Access Paper or Ask Questions

On Double Descent in Reinforcement Learning with LSTD and Random Features

Oct 20, 2023

David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse

Figure 1 for On Double Descent in Reinforcement Learning with LSTD and Random Features

Figure 2 for On Double Descent in Reinforcement Learning with LSTD and Random Features

Figure 3 for On Double Descent in Reinforcement Learning with LSTD and Random Features

Figure 4 for On Double Descent in Reinforcement Learning with LSTD and Random Features

Abstract:Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and $l_2$-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Square Bellman Error (MSBE) that feature correction terms responsible for the double-descent. Correction terms vanish when the $l_2$-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

Via

Access Paper or Ask Questions

InfraParis: A multi-modal and multi-task autonomous driving dataset

Sep 27, 2023

Gianni Franchi, Marwane Hariat, Xuanlong Yu, Nacim Belkhir, Antoine Manzanera, David Filliat

Abstract:Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

VIBR: Learning View-Invariant Value Functions for Robust Visual Control

Jun 14, 2023

Tom Dupuis, Jaonary Rabarisoa, Quoc-Cuong Pham, David Filliat

Abstract:End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities.

Via

Access Paper or Ask Questions

Latent Discriminant deterministic Uncertainty

Jul 20, 2022

Gianni Franchi, Xuanlong Yu, Andrei Bursuc, Emanuel Aldea, Severine Dubuisson, David Filliat

Figure 1 for Latent Discriminant deterministic Uncertainty

Figure 2 for Latent Discriminant deterministic Uncertainty

Figure 3 for Latent Discriminant deterministic Uncertainty

Figure 4 for Latent Discriminant deterministic Uncertainty

Abstract:Predictive uncertainty estimation is essential for deploying Deep Neural Networks in real-world autonomous systems. However, most successful approaches are computationally intensive. In this work, we attempt to address these challenges in the context of autonomous driving perception tasks. Recently proposed Deterministic Uncertainty Methods (DUM) can only partially meet such requirements as their scalability to complex computer vision tasks is not obvious. In this work we advance a scalable and effective DUM for high-resolution semantic segmentation, that relaxes the Lipschitz constraint typically hindering practicality of such architectures. We learn a discriminant latent space by leveraging a distinction maximization layer over an arbitrarily-sized set of trainable prototypes. Our approach achieves competitive results over Deep Ensembles, the state-of-the-art for uncertainty prediction, on image classification, segmentation and monocular depth estimation tasks. Our code is available at https://github.com/ENSTA-U2IS/LDU

* 24 pages. Accepted at ECCV 2022

Via

Access Paper or Ask Questions

MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks

Mar 02, 2022

Gianni Franchi, Xuanlong Yu, Andrei Bursuc, Rémi Kazmierczak, Séverine Dubuisson, Emanuel Aldea, David Filliat

Figure 1 for MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks

Figure 2 for MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks

Figure 3 for MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks

Figure 4 for MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks

Abstract:Predictive uncertainty estimation is essential for deploying Deep Neural Networks in real-world autonomous systems. However, disentangling the different types and sources of uncertainty is non trivial in most datasets, especially since there is no ground truth for uncertainty. In addition, different degrees of weather conditions can disrupt neural networks, resulting in inconsistent training data quality. Thus, we introduce the MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 8,500 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects and annotations for semantic segmentation, depth estimation, object and instance detection. MUAD allows to better assess the impact of different sources of uncertainty on model performance. We propose a study that shows the importance of having reliable Deep Neural Networks (DNNs) in multiple experiments, and will release our dataset to allow researchers to benchmark their algorithm methodically in ad-verse conditions. More information and the download link for MUAD are available at https://muad-dataset.github.io/ .

Via

Access Paper or Ask Questions

A study of deep perceptual metrics for image quality assessment

Feb 17, 2022

Rémi Kazmierczak, Gianni Franchi, Nacim Belkhir, Antoine Manzanera, David Filliat

Figure 1 for A study of deep perceptual metrics for image quality assessment

Figure 2 for A study of deep perceptual metrics for image quality assessment

Figure 3 for A study of deep perceptual metrics for image quality assessment

Figure 4 for A study of deep perceptual metrics for image quality assessment

Abstract:Several metrics exist to quantify the similarity between images, but they are inefficient when it comes to measure the similarity of highly distorted images. In this work, we propose to empirically investigate perceptual metrics based on deep neural networks for tackling the Image Quality Assessment (IQA) task. We study deep perceptual metrics according to different hyperparameters like the network's architecture or training procedure. Finally, we propose our multi-resolution perceptual metric (MR-Perceptual), that allows us to aggregate perceptual information at different resolutions and outperforms standard perceptual metrics on IQA tasks with varying image deformations. Our code is available at https://github.com/ENSTA-U2IS/MR_perceptual

Via

Access Paper or Ask Questions

Efficient State Representation Learning for Dynamic Robotic Scenarios

Sep 17, 2021

Zhaorun Chen, Liang Gong, Te Sun, Binhao Chen, Shenghan Xie, David Filliat, Natalia Díaz-Rodríguez

Figure 1 for Efficient State Representation Learning for Dynamic Robotic Scenarios

Figure 2 for Efficient State Representation Learning for Dynamic Robotic Scenarios

Figure 3 for Efficient State Representation Learning for Dynamic Robotic Scenarios

Figure 4 for Efficient State Representation Learning for Dynamic Robotic Scenarios

Abstract:While the rapid progress of deep learning fuels end-to-end reinforcement learning (RL), direct application, especially in high-dimensional space like robotic scenarios still suffers from high sample efficiency. Therefore State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states. However, the pervasive implementation of SRL is usually conducted by a decoupling strategy in which the observation-state mapping is learned separately, which is prone to over-fit. To handle such problem, we present a new algorithm called Policy Optimization via Abstract Representation which integrates SRL into the original RL scale. Firstly, We engage RL loss to assist in updating SRL model so that the states can evolve to meet the demand of reinforcement learning and maintain a good physical interpretation. Secondly, we introduce a dynamic parameter adjustment mechanism so that both models can efficiently adapt to each other. Thirdly, we introduce a new prior called domain resemblance to leverage expert demonstration to train the SRL model. Finally, we provide a real-time access by state graph to monitor the course of learning. Results show that our algorithm outperforms the PPO baselines and decoupling strategies in terms of sample efficiency and final rewards. Thus our model can efficiently deal with tasks in high dimensions and facilitate training real-life robots directly from scratch.

* 7 pages, we submit this to ICRA 2022 for review

Via

Access Paper or Ask Questions

Are standard Object Segmentation models sufficient for Learning Affordance Segmentation?

Jul 05, 2021

Hugo Caselles-Dupré, Michael Garcia-Ortiz, David Filliat

Figure 1 for Are standard Object Segmentation models sufficient for Learning Affordance Segmentation?

Figure 2 for Are standard Object Segmentation models sufficient for Learning Affordance Segmentation?

Figure 3 for Are standard Object Segmentation models sufficient for Learning Affordance Segmentation?

Figure 4 for Are standard Object Segmentation models sufficient for Learning Affordance Segmentation?

Abstract:Affordances are the possibilities of actions the environment offers to the individual. Ordinary objects (hammer, knife) usually have many affordances (grasping, pounding, cutting), and detecting these allow artificial agents to understand what are their possibilities in the environment, with obvious application in Robotics. Proposed benchmarks and state-of-the-art prediction models for supervised affordance segmentation are usually modifications of popular object segmentation models such as Mask R-CNN. We observe that theoretically, these popular object segmentation methods should be sufficient for detecting affordances masks. So we ask the question: is it necessary to tailor new architectures to the problem of learning affordances? We show that applying the out-of-the-box Mask R-CNN to the problem of affordances segmentation outperforms the current state-of-the-art. We conclude that the problem of supervised affordance segmentation is included in the problem of object segmentation and argue that better benchmarks for affordance learning should include action capacities.

Via

Access Paper or Ask Questions