Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lars Maaløe

Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Aug 15, 2024

Joakim Edin, Andreas Geert Motzfeldt, Casper L. Christensen, Tuukka Ruotsalo, Lars Maaløe, Maria Maistro

Figure 1 for Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Figure 2 for Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Figure 3 for Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Figure 4 for Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

Abstract:Deep neural network predictions are notoriously difficult to interpret. Feature attribution methods aim to explain these predictions by identifying the contribution of each input feature. Faithfulness, often evaluated using the area over the perturbation curve (AOPC), reflects feature attributions' accuracy in describing the internal mechanisms of deep neural networks. However, many studies rely on AOPC to compare faithfulness across different models, which we show can lead to false conclusions about models' faithfulness. Specifically, we find that AOPC is sensitive to variations in the model, resulting in unreliable cross-model comparisons. Moreover, AOPC scores are difficult to interpret in isolation without knowing the model-specific lower and upper limits. To address these issues, we propose a normalization approach, Normalized AOPC (NAOPC), enabling consistent cross-model evaluations and more meaningful interpretation of individual scores. Our experiments demonstrate that this normalization can radically change AOPC results, questioning the conclusions of earlier studies and offering a more robust framework for assessing feature attribution faithfulness.

Via

Access Paper or Ask Questions

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

Jun 13, 2024

Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

Abstract:Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves healthcare professionals hesitant to trust them. State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. We demonstrate on the automated medical coding task that adversarial robustness training improves explanation plausibility and introduce AttInGrad, a new explanation method superior to previous ones. By combining both contributions in a fully unsupervised setup, we produce explanations of comparable quality, or better, to that of a supervised approach. We release our code and model weights.

Via

Access Paper or Ask Questions

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Apr 21, 2023

Joakim Edin, Alexander Junge, Jakob D. Havtorn, Lasse Borgholt, Maria Maistro, Tuukka Ruotsalo, Lars Maaløe

Abstract:Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. In previous work, the macro F1 score has been calculated sub-optimally, and our correction doubles it. We contribute a revised model comparison using stratified sampling and identical experimental setups, including hyperparameters and decision boundary tuning. We analyze prediction errors to validate and falsify assumptions of previous works. The analysis confirms that all models struggle with rare codes, while long documents only have a negligible impact. Finally, we present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models. We release our code, model parameters, and new MIMIC-III and MIMIC-IV training and evaluation pipelines to accommodate fair future comparisons.

* 11 pages, 6 figures, to be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

Via

Access Paper or Ask Questions

Self-Supervised Speech Representation Learning: A Review

May 21, 2022

Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe(+2 more)

Figure 1 for Self-Supervised Speech Representation Learning: A Review

Figure 2 for Self-Supervised Speech Representation Learning: A Review

Figure 3 for Self-Supervised Speech Representation Learning: A Review

Figure 4 for Self-Supervised Speech Representation Learning: A Review

Abstract:Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

Via

Access Paper or Ask Questions

Benchmarking Generative Latent Variable Models for Speech

Apr 05, 2022

Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

Figure 1 for Benchmarking Generative Latent Variable Models for Speech

Figure 2 for Benchmarking Generative Latent Variable Models for Speech

Figure 3 for Benchmarking Generative Latent Variable Models for Speech

Figure 4 for Benchmarking Generative Latent Variable Models for Speech

Abstract:Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably, reported for speech models. To assess the quality of the learned representations, we also compare their usefulness for phoneme recognition. Finally, we adapt the Clockwork VAE, a state-of-the-art temporal LVM for video generation, to the speech domain. Despite being autoregressive only in latent space, we find that the Clockwork VAE can outperform previous LVMs and reduce the gap to deterministic models by using a hierarchy of latent variables.

* Accepted at the 2022 ICLR workshop on Deep Generative Models for Highly Structured Data (https://deep-gen-struct.github.io)

Via

Access Paper or Ask Questions

Model-agnostic out-of-distribution detection using combined statistical tests

Mar 02, 2022

Federico Bergamin, Pierre-Alexandre Mattei, Jakob D. Havtorn, Hugo Senetaire, Hugo Schmutz, Lars Maaløe, Søren Hauberg, Jes Frellsen

Figure 1 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 2 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 3 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 4 for Model-agnostic out-of-distribution detection using combined statistical tests

Abstract:We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both theoretically well-founded and exploit different sources of information based on the likelihood for the typicality test and its gradient for the score test. We show that combining them using Fisher's method overall leads to a more accurate out-of-distribution test. We also discuss the benefits of casting out-of-distribution detection as a statistical testing problem, noting in particular that false positive rate control can be valuable for practical out-of-distribution detection. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms without any assumptions on the out-distribution.

* Accepted at the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

Via

Access Paper or Ask Questions

A Brief Overview of Unsupervised Neural Speech Representation Learning

Mar 01, 2022

Lasse Borgholt, Jakob Drachmann Havtorn, Joakim Edin, Lars Maaløe, Christian Igel

Figure 1 for A Brief Overview of Unsupervised Neural Speech Representation Learning

Figure 2 for A Brief Overview of Unsupervised Neural Speech Representation Learning

Figure 3 for A Brief Overview of Unsupervised Neural Speech Representation Learning

Figure 4 for A Brief Overview of Unsupervised Neural Speech Representation Learning

Abstract:Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods from other domains rarely translate directly. We review the development of unsupervised representation learning for speech over the last decade. We identify two primary model categories: self-supervised methods and probabilistic latent variable models. We describe the models and develop a comprehensive taxonomy. Finally, we discuss and compare models from the two categories.

* The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing (SAS) at AAAI

Via

Access Paper or Ask Questions

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Nov 29, 2021

Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

Figure 1 for Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Figure 2 for Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Figure 3 for Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Figure 4 for Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Abstract:Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR in SLU. We compare learned speech features from wav2vec 2.0, state-of-the-art ASR transcripts, and the ground truth text as input for a novel speech-based named entity recognition task, a cardiac arrest detection task on real-world emergency calls and two existing SLU benchmarks. We show that learned speech features are superior to ASR transcripts on three classification tasks. For machine translation, ASR transcripts are still the better choice. We highlight the intrinsic robustness of wav2vec 2.0 representations to out-of-vocabulary words as key to better performance.

* Under review as a conference paper at ICASSP 2022

Via

Access Paper or Ask Questions

Hierarchical VAEs Know What They Don't Know

Mar 01, 2021

Jakob D. Havtorn, Jes Frellsen, Søren Hauberg, Lars Maaløe

Figure 1 for Hierarchical VAEs Know What They Don't Know

Figure 2 for Hierarchical VAEs Know What They Don't Know

Figure 3 for Hierarchical VAEs Know What They Don't Know

Figure 4 for Hierarchical VAEs Know What They Don't Know

Abstract:Deep generative models have shown themselves to be state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain this behavior by out-of-distribution data having in-distribution low-level features. We argue that this is both expected and desirable behavior. With this insight in hand, we develop a fast, scalable and fully unsupervised likelihood-ratio score for OOD detection that requires data to be in-distribution across all feature-levels. We benchmark the method on a vast set of data and model combinations and achieve state-of-the-art results on out-of-distribution detection.

* 18 pages, source code available at https://github.com/vlievin/biva-pytorch and https://github.com/larsmaaloee/BIVA

Via

Access Paper or Ask Questions

Do End-to-End Speech Recognition Models Care About Context?

Feb 17, 2021

Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

Figure 1 for Do End-to-End Speech Recognition Models Care About Context?

Figure 2 for Do End-to-End Speech Recognition Models Care About Context?

Figure 3 for Do End-to-End Speech Recognition Models Care About Context?

Figure 4 for Do End-to-End Speech Recognition Models Care About Context?

Abstract:The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.

* Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

Via

Access Paper or Ask Questions