Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesca Odone

Diffusing DeBias: a Recipe for Turning a Bug into a Feature

Feb 13, 2025

Massimiliano Ciranni, Vito Paolo Pastore, Roberto Di Via, Enzo Tartaglione, Francesca Odone, Vittorio Murino

Abstract:Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data which, whenever containing strong spurious correlations between specific attributes and target labels, can result in unrecoverable biases in model predictions. Tackling these biases is crucial in improving model generalization and trust, especially in real-world scenarios. This paper presents Diffusing DeBias (DDB), a novel approach acting as a plug-in for common methods in model debiasing while exploiting the inherent bias-learning tendency of diffusion models. Our approach leverages conditional diffusion models to generate synthetic bias-aligned images, used to train a bias amplifier model, to be further employed as an auxiliary method in different unsupervised debiasing approaches. Our proposed method, which also tackles the common issue of training set memorization typical of this type of tech- niques, beats current state-of-the-art in multiple benchmark datasets by significant margins, demonstrating its potential as a versatile and effective tool for tackling dataset bias in deep learning applications.

* 29 Pages, 12 Figures

Via

Access Paper or Ask Questions

Transferring disentangled representations: bridging the gap between synthetic and real images

Sep 26, 2024

Jacopo Dapueto, Nicoletta Noceti, Francesca Odone

Figure 1 for Transferring disentangled representations: bridging the gap between synthetic and real images

Figure 2 for Transferring disentangled representations: bridging the gap between synthetic and real images

Figure 3 for Transferring disentangled representations: bridging the gap between synthetic and real images

Figure 4 for Transferring disentangled representations: bridging the gap between synthetic and real images

Abstract:Developing meaningful and efficient representations that separate the fundamental structure of the data generation mechanism is crucial in representation learning. However, Disentangled Representation Learning has not fully shown its potential on real images, because of correlated generative factors, their resolution and limited access to ground truth labels. Specifically on the latter, we investigate the possibility of leveraging synthetic data to learn general-purpose disentangled representations applicable to real data, discussing the effect of fine-tuning and what properties of disentanglement are preserved after the transfer. We provide an extensive empirical study to address these issues. In addition, we propose a new interpretable intervention-based metric, to measure the quality of factors encoding in the representation. Our results indicate that some level of disentanglement, transferring a representation from synthetic to real data, is possible and effective.

Via

Access Paper or Ask Questions

Looking at Model Debiasing through the Lens of Anomaly Detection

Jul 25, 2024

Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino

Figure 1 for Looking at Model Debiasing through the Lens of Anomaly Detection

Figure 2 for Looking at Model Debiasing through the Lens of Anomaly Detection

Figure 3 for Looking at Model Debiasing through the Lens of Anomaly Detection

Figure 4 for Looking at Model Debiasing through the Lens of Anomaly Detection

Abstract:It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Jul 25, 2024

Roberto Di Via, Francesca Odone, Vito Paolo Pastore

Figure 1 for Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Figure 2 for Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Figure 3 for Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Figure 4 for Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Abstract:In the last few years, deep neural networks have been extensively applied in the medical domain for different tasks, ranging from image classification and segmentation to landmark detection. However, the application of these technologies in the medical domain is often hindered by data scarcity, both in terms of available annotations and images. This study introduces a new self-supervised pre-training protocol based on diffusion models for landmark detection in x-ray images. Our results show that the proposed self-supervised framework can provide accurate landmark detection with a minimal number of available annotated training images (up to 50), outperforming ImageNet supervised pre-training and state-of-the-art self-supervised pre-trainings for three popular x-ray benchmark datasets. To our knowledge, this is the first exploration of diffusion models for self-supervised learning in landmark detection, which may offer a valuable pre-training approach in few-shot regimes, for mitigating data scarcity.

Via

Access Paper or Ask Questions

Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Mar 03, 2024

Roberto Di Via, Matteo Santacesaria, Francesca Odone, Vito Paolo Pastore

Figure 1 for Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Figure 2 for Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Figure 3 for Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Figure 4 for Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Abstract:In recent years, deep learning has emerged as a promising technique for medical image analysis. However, this application domain is likely to suffer from a limited availability of large public datasets and annotations. A common solution to these challenges in deep learning is the usage of a transfer learning framework, typically with a fine-tuning protocol, where a large-scale source dataset is used to pre-train a model, further fine-tuned on the target dataset. In this paper, we present a systematic study analyzing whether the usage of small-scale in-domain x-ray image datasets may provide any improvement for landmark detection over models pre-trained on large natural image datasets only. We focus on the multi-landmark localization task for three datasets, including chest, head, and hand x-ray images. Our results show that using in-domain source datasets brings marginal or no benefit with respect to an ImageNet out-of-domain pre-training. Our findings can provide an indication for the development of robust landmark detection systems in medical images when no large annotated dataset is available.

* Accepted to ISBI 2024

Via

Access Paper or Ask Questions

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

May 29, 2023

Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

Figure 1 for View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Figure 2 for View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Figure 3 for View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Figure 4 for View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Abstract:For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. Interestingly, self-supervised approaches have recently been successfully applied to ease the training process and unlock access to widely available unlabelled data. While related research leverages different priors including LIDAR scans and stereo images, such priors again limit usability. Therefore, in this work, we propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone, leveraging multi-view constraints and weak labels. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods using LIDAR scans or stereo images.

Via

Access Paper or Ask Questions

Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Sep 16, 2022

Paolo Didier Alfano, Vito Paolo Pastore, Lorenzo Rosasco, Francesca Odone

Figure 1 for Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Figure 2 for Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Figure 3 for Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Figure 4 for Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Abstract:The impressive performances of deep learning architectures is associated to massive increase of models complexity. Millions of parameters need be tuned, with training and inference time scaling accordingly. But is massive fine-tuning necessary? In this paper, focusing on image classification, we consider a simple transfer learning approach exploiting pretrained convolutional features as input for a fast kernel method. We refer to this approach as top-tuning, since only the kernel classifier is trained. By performing more than 2500 training processes we show that this top-tuning approach provides comparable accuracy w.r.t. fine-tuning, with a training time that is between one and two orders of magnitude smaller. These results suggest that top-tuning provides a useful alternative to fine-tuning in small/medium datasets, especially when training efficiency is crucial.

Via

Access Paper or Ask Questions

Efficient Unsupervised Learning for Plankton Images

Sep 14, 2022

Paolo Didier Alfano, Marco Rando, Marco Letizia, Francesca Odone, Lorenzo Rosasco, Vito Paolo Pastore

Figure 1 for Efficient Unsupervised Learning for Plankton Images

Figure 2 for Efficient Unsupervised Learning for Plankton Images

Figure 3 for Efficient Unsupervised Learning for Plankton Images

Figure 4 for Efficient Unsupervised Learning for Plankton Images

Abstract:Monitoring plankton populations in situ is fundamental to preserve the aquatic ecosystem. Plankton microorganisms are in fact susceptible of minor environmental perturbations, that can reflect into consequent morphological and dynamical modifications. Nowadays, the availability of advanced automatic or semi-automatic acquisition systems has been allowing the production of an increasingly large amount of plankton image data. The adoption of machine learning algorithms to classify such data may be affected by the significant cost of manual annotation, due to both the huge quantity of acquired data and the numerosity of plankton species. To address these challenges, we propose an efficient unsupervised learning pipeline to provide accurate classification of plankton microorganisms. We build a set of image descriptors exploiting a two-step procedure. First, a Variational Autoencoder (VAE) is trained on features extracted by a pre-trained neural network. We then use the learnt latent space as image descriptor for clustering. We compare our method with state-of-the-art unsupervised approaches, where a set of pre-defined hand-crafted features is used for clustering of plankton images. The proposed pipeline outperforms the benchmark algorithms for all the plankton datasets included in our analysis, providing better image embedding properties.

* 13 pages. Accepted at the 26TH International Conference on Pattern Recognition (ICPR 2022)

Via

Access Paper or Ask Questions

FasterVideo: Efficient Online Joint Object Detection And Tracking

Apr 15, 2022

Issa Mouawad, Francesca Odone

Figure 1 for FasterVideo: Efficient Online Joint Object Detection And Tracking

Figure 2 for FasterVideo: Efficient Online Joint Object Detection And Tracking

Figure 3 for FasterVideo: Efficient Online Joint Object Detection And Tracking

Figure 4 for FasterVideo: Efficient Online Joint Object Detection And Tracking

Abstract:Object detection and tracking in videos represent essential and computationally demanding building blocks for current and future visual perception systems. In order to reduce the efficiency gap between available methods and computational requirements of real-world applications, we propose to re-think one of the most successful methods for image object detection, Faster R-CNN, and extend it to the video domain. Specifically, we extend the detection framework to learn instance-level embeddings which prove beneficial for data association and re-identification purposes. Focusing on the computational aspects of detection and tracking, our proposed method reaches a very high computational efficiency necessary for relevant applications, while still managing to compete with recent and state-of-the-art methods as shown in the experiments we conduct on standard object tracking benchmarks

* Accepted at 21st International Conference on Image Analysis and Processing (ICIAP 2021)

Via

Access Paper or Ask Questions

Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Mar 04, 2022

Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

Figure 1 for Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Figure 2 for Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Figure 3 for Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Figure 4 for Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Abstract:Monocular 3D object detection continues to attract attention due to the cost benefits and wider availability of RGB cameras. Despite the recent advances and the ability to acquire data at scale, annotation cost and complexity still limit the size of 3D object detection datasets in the supervised settings. Self-supervised methods, on the other hand, aim at training deep networks relying on pretext tasks or various consistency constraints. Moreover, other 3D perception tasks (such as depth estimation) have shown the benefits of temporal priors as a self-supervision signal. In this work, we argue that the temporal consistency on the level of object poses, provides an important supervision signal given the strong prior on physical motion. Specifically, we propose a self-supervised loss which uses this consistency, in addition to render-and-compare losses, to refine noisy pose predictions and derive high-quality pseudo labels. To assess the effectiveness of the proposed method, we finetune a synthetically trained monocular 3D object detection model using the pseudo-labels that we generated on real data. Evaluation on the standard KITTI3D benchmark demonstrates that our method reaches competitive performance compared to other monocular self-supervised and supervised methods.

Via

Access Paper or Ask Questions