Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LLark: A Multimodal Foundation Model for Music

Oct 11, 2023

Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. Bittner

Figure 1 for LLark: A Multimodal Foundation Model for Music

Figure 2 for LLark: A Multimodal Foundation Model for Music

Figure 3 for LLark: A Multimodal Foundation Model for Music

Figure 4 for LLark: A Multimodal Foundation Model for Music

Share this with someone who'll enjoy it:

Abstract:Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for music understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, and reasoning), we show that our model matches or outperforms existing baselines in zero-shot generalization for music understanding, and that humans show a high degree of agreement with the model's responses in captioning and reasoning tasks. LLark is trained entirely from open-source music data and models, and we make our training code available along with the release of this paper. Additional results and audio examples are at https://bit.ly/llark, and our source code is available at https://github.com/spotify-research/llark .

View paper on

Share this with someone who'll enjoy it:

Title:LLark: A Multimodal Foundation Model for Music

Paper and Code