Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joshua Nemecek

Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Oct 26, 2022

Colin Leong, Joshua Nemecek, Jacob Mansdorfer, Anna Filighera, Abraham Owodunni, Daniel Whitenack

Figure 1 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Figure 2 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Figure 3 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Figure 4 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Abstract:We present Bloom Library, a linguistically diverse set of multimodal and multilingual datasets for language modeling, image captioning, visual storytelling, and speech synthesis/recognition. These datasets represent either the most, or among the most, multilingual datasets for each of the included downstream tasks. In total, the initial release of the Bloom Library datasets covers 363 languages across 32 language families. We train downstream task models for various languages represented in the data, showing the viability of the data for future work in low-resource, multimodal NLP and establishing the first known baselines for these downstream tasks in certain languages (e.g., Bisu [bzi], with an estimated population of 700 users). Some of these first-of-their-kind baselines are comparable to state-of-the-art performance for higher-resourced languages. The Bloom Library datasets are released under Creative Commons licenses on the Hugging Face datasets hub to catalyze more linguistically diverse research in the included downstream tasks.

* EMNLP 2022
* 14 pages, 1 figure, 3 tables, accepted to and presented at EMNLP 2022

Via

Access Paper or Ask Questions

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Aug 04, 2021

Sangeeta Ghangam, Daniel Whitenack, Joshua Nemecek

Figure 1 for Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Figure 2 for Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Figure 3 for Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Figure 4 for Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Abstract:Running automatic speech recognition (ASR) on edge devices is non-trivial due to resource constraints, especially in scenarios that require supporting multiple languages. We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly, each fine-tuned for a particular accent. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.

* Accepted to IEEE WF-IOT 2021

Via

Access Paper or Ask Questions