Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antonio Almudévar

There Was Never a Bottleneck in Concept Bottleneck Models

Jun 05, 2025

Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega

Figure 1 for There Was Never a Bottleneck in Concept Bottleneck Models

Figure 2 for There Was Never a Bottleneck in Concept Bottleneck Models

Figure 3 for There Was Never a Bottleneck in Concept Bottleneck Models

Figure 4 for There Was Never a Bottleneck in Concept Bottleneck Models

Abstract:Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs support concept-level interventions with theoretical guarantees, remain consistent with Bayesian principles, and offer greater flexibility in key design choices.

Via

Access Paper or Ask Questions

Aligning Multimodal Representations through an Information Bottleneck

Jun 05, 2025

Antonio Almudévar, José Miguel Hernández-Lobato, Sameer Khurana, Ricard Marxer, Alfonso Ortega

Abstract:Contrastive losses have been extensively used as a tool for multimodal representation learning. However, it has been empirically observed that their use is not effective to learn an aligned representation space. In this paper, we argue that this phenomenon is caused by the presence of modality-specific information in the representation space. Although some of the most widely used contrastive losses maximize the mutual information between representations of both modalities, they are not designed to remove the modality-specific information. We give a theoretical description of this problem through the lens of the Information Bottleneck Principle. We also empirically analyze how different hyperparameters affect the emergence of this phenomenon in a controlled experimental setup. Finally, we propose a regularization term in the loss function that is derived by means of a variational approximation and aims to increase the representational alignment. We analyze in a set of controlled experiments and real-world applications the advantages of including this regularization term.

Via

Access Paper or Ask Questions

Angular Distance Distribution Loss for Audio Classification

Oct 31, 2024

Antonio Almudévar, Romain Serizel, Alfonso Ortega

Abstract:Classification is a pivotal task in deep learning not only because of its intrinsic importance, but also for providing embeddings with desirable properties in other tasks. To optimize these properties, a wide variety of loss functions have been proposed that attempt to minimize the intra-class distance and maximize the inter-class distance in the embeddings space. In this paper we argue that, in addition to these two, eliminating hierarchies within and among classes are two other desirable properties for classification embeddings. Furthermore, we propose the Angular Distance Distribution (ADD) Loss, which aims to enhance the four previous properties jointly. For this purpose, it imposes conditions on the first and second order statistical moments of the angular distance between embeddings. Finally, we perform experiments showing that our loss function improves all four properties and, consequently, performs better than other loss functions in audio classification tasks.

Via

Access Paper or Ask Questions

Defining and Measuring Disentanglement for non-Independent Factors of Variation

Aug 13, 2024

Antonio Almudévar, Alfonso Ortega, Luis Vicente, Antonio Miguel, Eduardo Lleida

Figure 1 for Defining and Measuring Disentanglement for non-Independent Factors of Variation

Figure 2 for Defining and Measuring Disentanglement for non-Independent Factors of Variation

Figure 3 for Defining and Measuring Disentanglement for non-Independent Factors of Variation

Figure 4 for Defining and Measuring Disentanglement for non-Independent Factors of Variation

Abstract:Representation learning is an approach that allows to discover and extract the factors of variation from the data. Intuitively, a representation is said to be disentangled if it separates the different factors of variation in a way that is understandable to humans. Definitions of disentanglement and metrics to measure it usually assume that the factors of variation are independent of each other. However, this is generally false in the real world, which limits the use of these definitions and metrics to very specific and unrealistic scenarios. In this paper we give a definition of disentanglement based on information theory that is also valid when the factors of variation are not independent. Furthermore, we relate this definition to the Information Bottleneck Method. Finally, we propose a method to measure the degree of disentanglement from the given definition that works when the factors of variation are not independent. We show through different experiments that the method proposed in this paper correctly measures disentanglement with non-independent factors of variation, while other methods fail in this scenario.

Via

Access Paper or Ask Questions

Predefined Prototypes for Intra-Class Separation and Disentanglement

Jun 23, 2024

Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon, Luis Vicente, Antonio Miguel, Eduardo Lleida

Figure 1 for Predefined Prototypes for Intra-Class Separation and Disentanglement

Figure 2 for Predefined Prototypes for Intra-Class Separation and Disentanglement

Abstract:Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered. It has shown promising results in scenarios with little labeled data or to design explainable models. Typically, prototypes are either defined as the average of the embeddings of a class or are designed to be trainable. In this work, we propose to predefine prototypes following human-specified criteria, which simplify the training pipeline and brings different advantages. Specifically, in this work we explore two of these advantages: increasing the inter-class separability of embeddings and disentangling embeddings with respect to different variance factors, which can translate into the possibility of having explainable predictions. Finally, we propose different experiments that help to understand our proposal and demonstrate empirically the mentioned advantages.

Via

Access Paper or Ask Questions

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Jun 19, 2024

Martin Lebourdais, Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

Figure 1 for Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Figure 2 for Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Figure 3 for Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Figure 4 for Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Abstract:Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties.

* Accepted at Interspeech 2024, 5 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Jan 18, 2024

Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon

Figure 1 for Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Figure 2 for Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Figure 3 for Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Figure 4 for Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Abstract:Unsupervised Multiple Domain Translation is the task of transforming data from one domain to other domains without having paired data to train the systems. Typically, methods based on Generative Adversarial Networks (GANs) are used to address this task. However, our proposal exclusively relies on a modified version of a Variational Autoencoder. This modification consists of the use of two latent variables disentangled in a controlled way by design. One of this latent variables is imposed to depend exclusively on the domain, while the other one must depend on the rest of the variability factors of the data. Additionally, the conditions imposed over the domain latent variable allow for better control and understanding of the latent space. We empirically demonstrate that our approach works on different vision datasets improving the performance of other well known methods. Finally, we prove that, indeed, one of the latent variables stores all the information related to the domain and the other one hardly contains any domain information.

Via

Access Paper or Ask Questions

An Explainable Proxy Model for Multiabel Audio Segmentation

Jan 17, 2024

Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

Figure 1 for An Explainable Proxy Model for Multiabel Audio Segmentation

Figure 2 for An Explainable Proxy Model for Multiabel Audio Segmentation

Figure 3 for An Explainable Proxy Model for Multiabel Audio Segmentation

Figure 4 for An Explainable Proxy Model for Multiabel Audio Segmentation

Abstract:Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision-making with machine learning. In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND), and overlapped speech detection (OSD) simultaneously. This proxy uses the non-negative matrix factorization (NMF) to map the embedding used for the segmentation to the frequency domain. Experiments conducted on two datasets show similar performances as the pre-trained black box model while showing strong explainability features. Specifically, the frequency bins used for the decision can be easily identified at both the segment level (local explanations) and global level (class prototypes).

* Accepted at ICASSP 2024

Via

Access Paper or Ask Questions