Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenqian Cui

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Jan 09, 2025

Wenqian Cui, Xiaoqi Jiao, Ziqiao Meng, Irwin King

Figure 1 for VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Figure 2 for VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Figure 3 for VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Figure 4 for VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

Abstract:With the growing demand for developing speech-based interaction models, end-to-end Spoken Language Models (SLMs) have emerged as a promising solution. When engaging in conversations with humans, it is essential for these models to comprehend a wide range of world knowledge. In this paper, we introduce VoxEval, a novel speech question-answering benchmark specifically designed to assess SLMs' knowledge understanding through purely speech-based interactions. Unlike existing AudioQA benchmarks, VoxEval maintains speech format for both questions and answers, evaluates model robustness across diverse audio conditions (varying timbres, audio qualities, and speaking styles), and pioneers the assessment of challenging domains like mathematical problem-solving in spoken format. Our comprehensive evaluation of recent SLMs using VoxEval reveals significant performance limitations in current models, highlighting crucial areas for future improvements.

Via

Access Paper or Ask Questions

Data Augmentation Techniques for Chinese Disease Name Normalization

Jan 02, 2025

Wenqian Cui, Xiangling Fu, Shaohui Liu, Mingjun Gu, Xien Liu, Ji Wu, Irwin King

Figure 1 for Data Augmentation Techniques for Chinese Disease Name Normalization

Figure 2 for Data Augmentation Techniques for Chinese Disease Name Normalization

Figure 3 for Data Augmentation Techniques for Chinese Disease Name Normalization

Abstract:Disease name normalization is an important task in the medical domain. It classifies disease names written in various formats into standardized names, serving as a fundamental component in smart healthcare systems for various disease-related functions. Nevertheless, the most significant obstacle to existing disease name normalization systems is the severe shortage of training data. Consequently, we present a novel data augmentation approach that includes a series of data augmentation techniques and some supporting modules to help mitigate the problem. Through extensive experimentation, we illustrate that our proposed approach exhibits significant performance improvements across various baseline models and training objectives, particularly in scenarios with limited training data

* The Version of Record of this contribution is published in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2024)

Via

Access Paper or Ask Questions

Recent Advances in Speech Language Models: A Survey

Oct 01, 2024

Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Yiwen Guo, Irwin King

Figure 1 for Recent Advances in Speech Language Models: A Survey

Figure 2 for Recent Advances in Speech Language Models: A Survey

Figure 3 for Recent Advances in Speech Language Models: A Survey

Figure 4 for Recent Advances in Speech Language Models: A Survey

Abstract:Large Language Models (LLMs) have recently garnered significant attention, primarily for their capabilities in text-based interactions. However, natural human interaction often relies on speech, necessitating a shift towards voice-based models. A straightforward approach to achieve this involves a pipeline of ``Automatic Speech Recognition (ASR) + LLM + Text-to-Speech (TTS)", where input speech is transcribed to text, processed by an LLM, and then converted back to speech. Despite being straightforward, this method suffers from inherent limitations, such as information loss during modality conversion and error accumulation across the three stages. To address these issues, Speech Language Models (SpeechLMs) -- end-to-end models that generate speech without converting from text -- have emerged as a promising alternative. This survey paper provides the first comprehensive overview of recent methodologies for constructing SpeechLMs, detailing the key components of their architecture and the various training recipes integral to their development. Additionally, we systematically survey the various capabilities of SpeechLMs, categorize the evaluation metrics for SpeechLMs, and discuss the challenges and future research directions in this rapidly evolving field.

* Work in progress

Via

Access Paper or Ask Questions

MoodLoopGP: Generating Emotion-Conditioned Loop Tablature Music with Multi-Granular Features

Jan 25, 2024

Wenqian Cui, Pedro Sarmento, Mathieu Barthet

Abstract:Loopable music generation systems enable diverse applications, but they often lack controllability and customization capabilities. We argue that enhancing controllability can enrich these models, with emotional expression being a crucial aspect for both creators and listeners. Hence, building upon LooperGP, a loopable tablature generation model, this paper explores endowing systems with control over conveyed emotions. To enable such conditional generation, we propose integrating musical knowledge by utilizing multi-granular semantic and musical features during model training and inference. Specifically, we incorporate song-level features (Emotion Labels, Tempo, and Mode) and bar-level features (Tonal Tension) together to guide emotional expression. Through algorithmic and human evaluations, we demonstrate the approach's effectiveness in producing music conveying two contrasting target emotions, happiness and sadness. An ablation study is also conducted to clarify the contributing factors behind our approach's results.

* This preprint is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). The Version of Record of this contribution is published in Proceedings of EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2024

Via

Access Paper or Ask Questions

Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Jun 02, 2023

Wenqian Cui, Shaohui Liu, Xiangling Fu, Xien Liu, Ji Wu

Figure 1 for Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Figure 2 for Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Figure 3 for Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Figure 4 for Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Abstract:The disease is a core concept in the medical field, and the task of normalizing disease names is the basis of all disease-related tasks. However, due to the multi-axis and multi-grain nature of disease names, incorrect information is often injected and harms the performance when using general text data augmentation techniques. To address the above problem, we propose a set of data augmentation techniques that work together as an augmented training task for disease normalization. Our data augmentation methods are based on both the clinical disease corpus and standard disease corpus derived from ICD-10 coding. Extensive experiments are conducted to show the effectiveness of our proposed methods. The results demonstrate that our methods can have up to 3\% performance gain compared to non-augmented counterparts, and they can work even better on smaller datasets.

Via

Access Paper or Ask Questions