Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Sep 10, 2024

Wenyu Zhang, Shuo Sun, Bin Wang, Xunlong Zou, Zhuohan Liu, Yingxu He, Geyu Lin, Nancy F. Chen, Ai Ti Aw

Figure 1 for MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Figure 2 for MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Figure 3 for MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Figure 4 for MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Share this with someone who'll enjoy it:

Abstract:The rapid advancements in large language models (LLMs) have significantly enhanced natural language processing capabilities, facilitating the development of AudioLLMs that process and understand speech and audio inputs alongside text. Existing AudioLLMs typically combine a pre-trained audio encoder with a pre-trained LLM, which are subsequently finetuned on specific audio tasks. However, the pre-trained audio encoder has constrained capacity to capture features for new tasks and datasets. To address this, we propose to incorporate mixtures of `weak' encoders (MoWE) into the AudioLLM framework. MoWE supplements a base encoder with a pool of relatively light weight encoders, selectively activated based on the audio input to enhance feature extraction without significantly increasing model size. Our empirical results demonstrate that MoWE effectively improves multi-task performance, broadening the applicability of AudioLLMs to more diverse audio tasks.

View paper on

Share this with someone who'll enjoy it:

Title:MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

Paper and Code