Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nacim Belkhir

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Dec 04, 2024

Gianni Franchi, Dat Nguyen Trong, Nacim Belkhir, Guoxuan Xia, Andrea Pilzer

Abstract:Uncertainty quantification in text-to-image (T2I) generative models is crucial for understanding model behavior and improving output reliability. In this paper, we are the first to quantify and evaluate the uncertainty of T2I models with respect to the prompt. Alongside adapting existing approaches designed to measure uncertainty in the image space, we also introduce Prompt-based UNCertainty Estimation for T2I models (PUNC), a novel method leveraging Large Vision-Language Models (LVLMs) to better address uncertainties arising from the semantics of the prompt and generated images. PUNC utilizes a LVLM to caption a generated image, and then compares the caption with the original prompt in the more semantically meaningful text space. PUNC also enables the disentanglement of both aleatoric and epistemic uncertainties via precision and recall, which image-space approaches are unable to do. Extensive experiments demonstrate that PUNC outperforms state-of-the-art uncertainty estimation techniques across various settings. Uncertainty quantification in text-to-image generation models can be used on various applications including bias detection, copyright protection, and OOD detection. We also introduce a comprehensive dataset of text prompts and generation pairs to foster further research in uncertainty quantification for generative models. Our findings illustrate that PUNC not only achieves competitive performance but also enables novel applications in evaluating and improving the trustworthiness of text-to-image models.

* 28 pages and 22 figures

Via

Access Paper or Ask Questions

NECO: NEural Collapse Based Out-of-distribution detection

Oct 12, 2023

Mouïn Ben Ammar, Nacim Belkhir, Sebastian Popescu, Antoine Manzanera, Gianni Franchi

Figure 1 for NECO: NEural Collapse Based Out-of-distribution detection

Figure 2 for NECO: NEural Collapse Based Out-of-distribution detection

Figure 3 for NECO: NEural Collapse Based Out-of-distribution detection

Figure 4 for NECO: NEural Collapse Based Out-of-distribution detection

Abstract:Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.

* 28 pages

Via

Access Paper or Ask Questions

InfraParis: A multi-modal and multi-task autonomous driving dataset

Sep 27, 2023

Gianni Franchi, Marwane Hariat, Xuanlong Yu, Nacim Belkhir, Antoine Manzanera, David Filliat

Abstract:Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

A study of deep perceptual metrics for image quality assessment

Feb 17, 2022

Rémi Kazmierczak, Gianni Franchi, Nacim Belkhir, Antoine Manzanera, David Filliat

Figure 1 for A study of deep perceptual metrics for image quality assessment

Figure 2 for A study of deep perceptual metrics for image quality assessment

Figure 3 for A study of deep perceptual metrics for image quality assessment

Figure 4 for A study of deep perceptual metrics for image quality assessment

Abstract:Several metrics exist to quantify the similarity between images, but they are inefficient when it comes to measure the similarity of highly distorted images. In this work, we propose to empirically investigate perceptual metrics based on deep neural networks for tackling the Image Quality Assessment (IQA) task. We study deep perceptual metrics according to different hyperparameters like the network's architecture or training procedure. Finally, we propose our multi-resolution perceptual metric (MR-Perceptual), that allows us to aggregate perceptual information at different resolutions and outperforms standard perceptual metrics on IQA tasks with varying image deformations. Our code is available at https://github.com/ENSTA-U2IS/MR_perceptual

Via

Access Paper or Ask Questions

Reliable Semantic Segmentation with Superpixel-Mix

Aug 02, 2021

Gianni Franchi, Nacim Belkhir, Mai Lan Ha, Yufei Hu, Andrei Bursuc, Volker Blanz, Angela Yao

Figure 1 for Reliable Semantic Segmentation with Superpixel-Mix

Figure 2 for Reliable Semantic Segmentation with Superpixel-Mix

Figure 3 for Reliable Semantic Segmentation with Superpixel-Mix

Figure 4 for Reliable Semantic Segmentation with Superpixel-Mix

Abstract:Along with predictive performance and runtime speed, reliability is a key requirement for real-world semantic segmentation. Reliability encompasses robustness, predictive uncertainty and reduced bias. To improve reliability, we introduce Superpixel-mix, a new superpixel-based data augmentation method with teacher-student consistency training. Unlike other mixing-based augmentation techniques, mixing superpixels between images is aware of object boundaries, while yielding consistent gains in segmentation accuracy. Our proposed technique achieves state-of-the-art results in semi-supervised semantic segmentation on the Cityscapes dataset. Moreover, Superpixel-mix improves the reliability of semantic segmentation by reducing network uncertainty and bias, as confirmed by competitive results under strong distributions shift (adverse weather, image corruptions) and when facing out-of-distribution data.

* 25 pages

Via

Access Paper or Ask Questions

Learning Deep Morphological Networks with Neural Architecture Search

Jun 14, 2021

Yufei Hu, Nacim Belkhir, Jesus Angulo, Angela Yao, Gianni Franchi

Figure 1 for Learning Deep Morphological Networks with Neural Architecture Search

Figure 2 for Learning Deep Morphological Networks with Neural Architecture Search

Figure 3 for Learning Deep Morphological Networks with Neural Architecture Search

Figure 4 for Learning Deep Morphological Networks with Neural Architecture Search

Abstract:Deep Neural Networks (DNNs) are generated by sequentially performing linear and non-linear processes. Using a combination of linear and non-linear procedures is critical for generating a sufficiently deep feature space. The majority of non-linear operators are derivations of activation functions or pooling functions. Mathematical morphology is a branch of mathematics that provides non-linear operators for a variety of image processing problems. We investigate the utility of integrating these operations in an end-to-end deep learning framework in this paper. DNNs are designed to acquire a realistic representation for a particular job. Morphological operators give topological descriptors that convey salient information about the shapes of objects depicted in images. We propose a method based on meta-learning to incorporate morphological operators into DNNs. The learned architecture demonstrates how our novel morphological operations significantly increase DNN performance on various tasks, including picture classification and edge detection.

* 19 pages

Via

Access Paper or Ask Questions