Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Aug 07, 2024

Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri, Laurent Besacier

Figure 1 for Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Figure 2 for Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Figure 3 for Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Figure 4 for Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Share this with someone who'll enjoy it:

Abstract:We present Speech-MASSIVE, a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus. Speech-MASSIVE covers 12 languages from different families and inherits from MASSIVE the annotations for the intent prediction and slot-filling tasks. Our extension is prompted by the scarcity of massively multilingual SLU datasets and the growing need for versatile speech datasets to assess foundation models (LLMs, speech encoders) across languages and tasks. We provide a multimodal, multitask, multilingual dataset and report SLU baselines using both cascaded and end-to-end architectures in various training scenarios (zero-shot, few-shot, and full fine-tune). Furthermore, we demonstrate the suitability of Speech-MASSIVE for benchmarking other tasks such as speech transcription, language identification, and speech translation. The dataset, models, and code are publicly available at: https://github.com/hlt-mt/Speech-MASSIVE

* Accepted at INTERSPEECH 2024. This version includes the same content but with additional appendices

View paper on

Share this with someone who'll enjoy it:

Title:Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Paper and Code