Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Sep 17, 2024

Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong, Kasima Tharnpipitchai, Kunat Pipatanakul

Figure 1 for Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Figure 2 for Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Figure 3 for Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Figure 4 for Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Share this with someone who'll enjoy it:

Abstract:Audio language models can understand audio inputs and perform a range of audio-related tasks based on instructions, such as speech recognition and audio captioning, where the instructions are usually textual prompts. Audio language models are mostly initialized from pre-trained audio encoders and large language models (LLMs). Although these pre-trained components were developed to support multiple languages, audio-language models are trained predominantly on English data, which may limit their usability to only English instructions or English speech inputs. First, this paper examines the performance of existing audio language models in an underserved language using Thai as an example. This paper demonstrates that, despite being built on multilingual backbones, audio language models do not exhibit cross-lingual emergent abilities to low-resource languages. Second, this paper studies data mixture for developing audio language models that are optimized for a target language as well as English. In addition. this paper integrates audio comprehension and speech instruction-following capabilities into a single unified model. Our experiments provide insights into data mixture for enhancing instruction-following capabilities in both a low-resource language and English. Our model, Typhoon-Audio, outperforms existing open-source audio language models by a considerable margin, and it is comparable to state-of-the-art Gemini-1.5-Pro in both English and Thai languages.

* 5 pages. Preprint under review

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Paper and Code