Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Beck

FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Apr 24, 2025

Yulia Otmakhova, Hung Thinh Truong, Rahmad Mahendra, Zenan Zhai, Rongxin Zhu, Daniel Beck, Jey Han Lau

Figure 1 for FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Figure 2 for FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Figure 3 for FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Figure 4 for FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Abstract:We present FLUKE (Framework for LingUistically-driven and tasK-agnostic robustness Evaluation), a task-agnostic framework for assessing model robustness through systematic minimal variations of test data. FLUKE introduces controlled variations across linguistic levels - from orthography to dialect and style varieties - and leverages large language models (LLMs) with human validation to generate modifications. We demonstrate FLUKE's utility by evaluating both fine-tuned models and LLMs across four diverse NLP tasks, and reveal that (1) the impact of linguistic variations is highly task-dependent, with some tests being critical for certain tasks but irrelevant for others; (2) while LLMs have better overall robustness compared to fine-tuned models, they still exhibit significant brittleness to certain linguistic variations; (3) all models show substantial vulnerability to negation modifications across most tasks. These findings highlight the importance of systematic robustness testing for understanding model behaviors.

Via

Access Paper or Ask Questions

PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Jul 05, 2023

Viktor Schlegel, Hao Li, Yuping Wu, Anand Subramanian, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Daniel Beck, Xiaojun Zeng, Riza Theresa Batista-Navarro, Stefan Winkler(+1 more)

Figure 1 for PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Figure 2 for PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Figure 3 for PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Abstract:This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

* 8 pages. ImageClef 2023 MediQA-Sum

Via

Access Paper or Ask Questions

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Jun 05, 2023

Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, Goran Nenadic

Figure 1 for PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Figure 2 for PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Figure 3 for PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Figure 4 for PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Abstract:Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.

* Accepted by ACL 2023's workshop BioNLP 2023

Via

Access Paper or Ask Questions

Modelling Emotion Dynamics in Song Lyrics with State Space Models

Oct 17, 2022

Yingjin Song, Daniel Beck

Figure 1 for Modelling Emotion Dynamics in Song Lyrics with State Space Models

Figure 2 for Modelling Emotion Dynamics in Song Lyrics with State Space Models

Figure 3 for Modelling Emotion Dynamics in Song Lyrics with State Space Models

Figure 4 for Modelling Emotion Dynamics in Song Lyrics with State Space Models

Abstract:Most previous work in music emotion recognition assumes a single or a few song-level labels for the whole song. While it is known that different emotions can vary in intensity within a song, annotated data for this setup is scarce and difficult to obtain. In this work, we propose a method to predict emotion dynamics in song lyrics without song-level supervision. We frame each song as a time series and employ a State Space Model (SSM), combining a sentence-level emotion predictor with an Expectation-Maximization (EM) procedure to generate the full emotion dynamics. Our experiments show that applying our method consistently improves the performance of sentence-level baselines without requiring any annotated songs, making it ideal for limited training data scenarios. Further analysis through case studies shows the benefits of our method while also indicating the limitations and pointing to future directions.

* To appear in Transactions of the Association for Computational Linguistics (TACL); 17 pages, 4 figures

Via

Access Paper or Ask Questions

Generating Diverse Descriptions from Semantic Graphs

Aug 13, 2021

Jiuzhou Han, Daniel Beck, Trevor Cohn

Figure 1 for Generating Diverse Descriptions from Semantic Graphs

Figure 2 for Generating Diverse Descriptions from Semantic Graphs

Figure 3 for Generating Diverse Descriptions from Semantic Graphs

Figure 4 for Generating Diverse Descriptions from Semantic Graphs

Abstract:Text generation from semantic graphs is traditionally performed with deterministic methods, which generate a unique description given an input graph. However, the generation problem admits a range of acceptable textual outputs, exhibiting lexical, syntactic and semantic variation. To address this disconnect, we present two main contributions. First, we propose a stochastic graph-to-text model, incorporating a latent variable in an encoder-decoder model, and its use in an ensemble. Second, to assess the diversity of the generated sentences, we propose a new automatic evaluation metric which jointly evaluates output diversity and quality in a multi-reference setting. We evaluate the models on WebNLG datasets in English and Russian, and show an ensemble of stochastic models produces diverse sets of generated sentences, while retaining similar quality to state-of-the-art models.

* INLG 2021

Via

Access Paper or Ask Questions

BOSS: Bayesian Optimization over String Spaces

Oct 02, 2020

Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

Figure 1 for BOSS: Bayesian Optimization over String Spaces

Figure 2 for BOSS: Bayesian Optimization over String Spaces

Figure 3 for BOSS: Bayesian Optimization over String Spaces

Figure 4 for BOSS: Bayesian Optimization over String Spaces

Abstract:This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar.

Via

Access Paper or Ask Questions

A Unified Neural Architecture for Instrumental Audio Tasks

Mar 01, 2019

Steven Spratley, Daniel Beck, Trevor Cohn

Figure 1 for A Unified Neural Architecture for Instrumental Audio Tasks

Figure 2 for A Unified Neural Architecture for Instrumental Audio Tasks

Figure 3 for A Unified Neural Architecture for Instrumental Audio Tasks

Figure 4 for A Unified Neural Architecture for Instrumental Audio Tasks

Abstract:Within Music Information Retrieval (MIR), prominent tasks -- including pitch-tracking, source-separation, super-resolution, and synthesis -- typically call for specialised methods, despite their similarities. Conditional Generative Adversarial Networks (cGANs) have been shown to be highly versatile in learning general image-to-image translations, but have not yet been adapted across MIR. In this work, we present an end-to-end supervisable architecture to perform all aforementioned audio tasks, consisting of a WaveNet synthesiser conditioned on the output of a jointly-trained cGAN spectrogram translator. In doing so, we demonstrate the potential of such flexible techniques to unify MIR tasks, promote efficient transfer learning, and converge research to the improvement of powerful, general methods. Finally, to the best of our knowledge, we present the first application of GANs to guided instrument synthesis.

* To appear in Proc. ICASSP 2019, May 12-17, Brighton, UK

Via

Access Paper or Ask Questions

Graph-to-Sequence Learning using Gated Graph Neural Networks

Jun 26, 2018

Daniel Beck, Gholamreza Haffari, Trevor Cohn

Figure 1 for Graph-to-Sequence Learning using Gated Graph Neural Networks

Figure 2 for Graph-to-Sequence Learning using Gated Graph Neural Networks

Figure 3 for Graph-to-Sequence Learning using Gated Graph Neural Networks

Figure 4 for Graph-to-Sequence Learning using Gated Graph Neural Networks

Abstract:Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.

* ACL 2018

Via

Access Paper or Ask Questions

Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Jun 30, 2016

Daniel Beck, Lucia Specia, Trevor Cohn

Figure 1 for Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Figure 2 for Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Figure 3 for Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Figure 4 for Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Abstract:Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We investigate probabilistic methods for Quality Estimation that can provide well-calibrated uncertainty estimates and evaluate them in terms of their full posterior predictive distributions. We also show how this posterior information can be useful in an asymmetric risk scenario, which aims to capture typical situations in translation workflows.

* Proceedings of CoNLL 2016

Via

Access Paper or Ask Questions

Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

Apr 18, 2016

Daniel Beck, Adrià de Gispert, Gonzalo Iglesias, Aurelien Waite, Bill Byrne

Figure 1 for Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

Figure 2 for Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

Figure 3 for Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

Figure 4 for Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

Abstract:We address the problem of automatically finding the parameters of a statistical machine translation system that maximize BLEU scores while ensuring that decoding speed exceeds a minimum value. We propose the use of Bayesian Optimization to efficiently tune the speed-related decoding parameters by easily incorporating speed as a noisy constraint function. The obtained parameter values are guaranteed to satisfy the speed constraint with an associated confidence margin. Across three language pairs and two speed constraint values, we report overall optimization time reduction compared to grid and random search. We also show that Bayesian Optimization can decouple speed and BLEU measurements, resulting in a further reduction of overall optimization time as speed is measured over a small subset of sentences.

* To appear at NAACL 2016

Via

Access Paper or Ask Questions