Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jakob D. Havtorn

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

Jun 13, 2024

Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

Abstract:Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves healthcare professionals hesitant to trust them. State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. We demonstrate on the automated medical coding task that adversarial robustness training improves explanation plausibility and introduce AttInGrad, a new explanation method superior to previous ones. By combining both contributions in a fully unsupervised setup, we produce explanations of comparable quality, or better, to that of a supervised approach. We release our code and model weights.

Via

Access Paper or Ask Questions

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Apr 21, 2023

Joakim Edin, Alexander Junge, Jakob D. Havtorn, Lasse Borgholt, Maria Maistro, Tuukka Ruotsalo, Lars Maaløe

Figure 1 for Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Figure 2 for Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Figure 3 for Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Figure 4 for Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Abstract:Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. In previous work, the macro F1 score has been calculated sub-optimally, and our correction doubles it. We contribute a revised model comparison using stratified sampling and identical experimental setups, including hyperparameters and decision boundary tuning. We analyze prediction errors to validate and falsify assumptions of previous works. The analysis confirms that all models struggle with rare codes, while long documents only have a negligible impact. Finally, we present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models. We release our code, model parameters, and new MIMIC-III and MIMIC-IV training and evaluation pipelines to accommodate fair future comparisons.

* 11 pages, 6 figures, to be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

Via

Access Paper or Ask Questions

Self-Supervised Speech Representation Learning: A Review

May 21, 2022

Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe(+2 more)

Figure 1 for Self-Supervised Speech Representation Learning: A Review

Figure 2 for Self-Supervised Speech Representation Learning: A Review

Figure 3 for Self-Supervised Speech Representation Learning: A Review

Figure 4 for Self-Supervised Speech Representation Learning: A Review

Abstract:Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

Via

Access Paper or Ask Questions

Benchmarking Generative Latent Variable Models for Speech

Apr 05, 2022

Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

Figure 1 for Benchmarking Generative Latent Variable Models for Speech

Figure 2 for Benchmarking Generative Latent Variable Models for Speech

Figure 3 for Benchmarking Generative Latent Variable Models for Speech

Figure 4 for Benchmarking Generative Latent Variable Models for Speech

Abstract:Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably, reported for speech models. To assess the quality of the learned representations, we also compare their usefulness for phoneme recognition. Finally, we adapt the Clockwork VAE, a state-of-the-art temporal LVM for video generation, to the speech domain. Despite being autoregressive only in latent space, we find that the Clockwork VAE can outperform previous LVMs and reduce the gap to deterministic models by using a hierarchy of latent variables.

* Accepted at the 2022 ICLR workshop on Deep Generative Models for Highly Structured Data (https://deep-gen-struct.github.io)

Via

Access Paper or Ask Questions

Model-agnostic out-of-distribution detection using combined statistical tests

Mar 02, 2022

Federico Bergamin, Pierre-Alexandre Mattei, Jakob D. Havtorn, Hugo Senetaire, Hugo Schmutz, Lars Maaløe, Søren Hauberg, Jes Frellsen

Figure 1 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 2 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 3 for Model-agnostic out-of-distribution detection using combined statistical tests

Figure 4 for Model-agnostic out-of-distribution detection using combined statistical tests

Abstract:We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both theoretically well-founded and exploit different sources of information based on the likelihood for the typicality test and its gradient for the score test. We show that combining them using Fisher's method overall leads to a more accurate out-of-distribution test. We also discuss the benefits of casting out-of-distribution detection as a statistical testing problem, noting in particular that false positive rate control can be valuable for practical out-of-distribution detection. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms without any assumptions on the out-distribution.

* Accepted at the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

Via

Access Paper or Ask Questions

Hierarchical VAEs Know What They Don't Know

Mar 01, 2021

Jakob D. Havtorn, Jes Frellsen, Søren Hauberg, Lars Maaløe

Figure 1 for Hierarchical VAEs Know What They Don't Know

Figure 2 for Hierarchical VAEs Know What They Don't Know

Figure 3 for Hierarchical VAEs Know What They Don't Know

Figure 4 for Hierarchical VAEs Know What They Don't Know

Abstract:Deep generative models have shown themselves to be state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain this behavior by out-of-distribution data having in-distribution low-level features. We argue that this is both expected and desirable behavior. With this insight in hand, we develop a fast, scalable and fully unsupervised likelihood-ratio score for OOD detection that requires data to be in-distribution across all feature-levels. We benchmark the method on a vast set of data and model combinations and achieve state-of-the-art results on out-of-distribution detection.

* 18 pages, source code available at https://github.com/vlievin/biva-pytorch and https://github.com/larsmaaloee/BIVA

Via

Access Paper or Ask Questions

MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

May 12, 2020

Jakob D. Havtorn, Jan Latko, Joakim Edin, Lasse Borgholt, Lars Maaløe, Lorenzo Belgrano, Nicolai F. Jacobsen, Regitze Sdun, Željko Agić

Figure 1 for MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

Figure 2 for MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

Figure 3 for MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

Figure 4 for MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

Abstract:We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalities or views, as it jointly learns from streamed audio and its noisy transcription into text via automatic speech recognition. Our results show significant gains of jointly learning from the two modalities when compared to text or audio only, under adverse noise and limited volume of training data. The results generalize to medical symptoms detection where we observe a similar pattern of improvements with multimodal learning.

* Accepted at ACL 2020

Via

Access Paper or Ask Questions